Am Samstag, dem 13.12.2025 um 11:23 +0100 schrieb Richard Biener: > On Fri, Dec 12, 2025 at 6:41 PM Martin Uecker <[email protected]> wrote: > > > > > > > > This version is even nicer: https://godbolt.org/z/3M4Y6Pa3h > > > > In both cases the compiler understands that the function is the > > identity on integers and optimizes it to a move, but it still > > makes an aliasing decision that is inconsistent with this basic > > fact when this integer is back-converted to a pointer. > > Both show the issue is that we consider integers of unknown provenance > to only point to global variables, never to stack ones. This "rule" is > derived > from the idea of no code doing such thing but eventually having global > variables laid out at absolute addresses. Conditional copy propagating of > (uintptr_t)&x to the assignment site would also avoid the issue since the > provenance is no longer transfered by an equivalence but an assignment. > That it works when retaining the function call (see below for your godbolt > case inline) shows that nonlocal properly handles escaped "integers". > But with inlining it degenerates. > > This is also easier to fix than dropping integer provenance tracking. Simply > by making 100 not have nonlocal but anything provenance (also at some cost). > > #include <stdio.h> > #include <stdint.h> > #include <limits.h> > > #ifdef NOIPA > [[gnu::noipa]] > #endif > uintptr_t id(uintptr_t x) > { > return x; > uintptr_t i = 0; > while (1) > if (++i == x) break; > return i; > } > > int main() > { > int x = 0; > int *p = (int*)id((uintptr_t)&x); > *p = 15; > printf("%d\n", x); > } > > IMO while academically interesting the loop case isn't of practical concern, > the conditional equivalence one is more so.
Yes, but it just illustrates nicely that once you track provenance via integers, usual transformation of code using integer are not necessarily consistent with it, and the provenance information then depends on where during optimization you extract it, which makes it difficult to come up with a consistent model. > > Btw, we can properly handle "pointer difference addressing" where you > construct a pointer to an object from the difference of two object pointers. > "Properly" as in, we can optimize this. > > int *i = malloc (4); > int *j = malloc (4); > ptrdiff_t diff = (uintptr_t)i - (uintptr_t)j; > int *ip = (int *)((uintptr_t)j + diff); > > This was once a common way of handling pointers in sysv shared memory from > different processes and IIRC this was important to optimize this use-case. > The > other was from Matlab generated code which plumbed C <-> fortran by > marshalling > 64bit C pointers through fortan routines by splitting into two halves > and passing as > double values. So yes, we also track provenance through FP values. I am still wondering to what extend these optimization could not be preserved also in a strict provenance model. Martin > > That said, we arrived here by optimization needs plus handling code in the > wild > correctly that's invalid with strict reading of the C standard (which > only allows > back-and-forth casting of pointer-to-integer of exactly the original > pointer value, > not any offsetted value). > > Richard. > > > Martin > > > > Am Freitag, dem 12.12.2025 um 18:23 +0100 schrieb Martin Uecker: > > > Am Freitag, dem 12.12.2025 um 16:16 +0100 schrieb Richard Biener: > > > > On Thu, Dec 11, 2025 at 8:33 PM Martin Uecker <[email protected]> wrote: > > > > > > > > > > Am Donnerstag, dem 11.12.2025 um 12:48 +0100 schrieb Richard Biener: > > > > > > On Thu, Dec 11, 2025 at 7:48 AM Martin Uecker <[email protected]> > > > > > > wrote: > > > > > > > > > > > > > > Am Donnerstag, dem 11.12.2025 um 04:12 +0000 schrieb Krister > > > > > > > Walfridsson via Gcc: > > > > > > > > On Wed, 10 Dec 2025, Richard Biener wrote: > > > > > > > > > > > > > > > > > > The problem is that in GIMPLE, a pointer does not need to > > > > > > > > > > be in bounds. The caller could call the function with a > > > > > > > > > > value of i such that p + i happens to be equal to &a. So, > > > > > > > > > > as I understand it, the GIMPLE semantics do not allow the > > > > > > > > > > pass to conclude that p + i == &a is false, unless p + i is > > > > > > > > > > dereferenced (because dereferencing a through p + i would > > > > > > > > > > be UB due to provenance). > > > > > > > > > > > > > > > > > > GIMPLE adopts most of the C pointer restrictions here thus we > > > > > > > > > can (and do) conclude that pointers stay within an object > > > > > > > > > when advanced. This is used by the PTA pass which results > > > > > > > > > are used when we optimize your example. You have to divert > > > > > > > > > to integer arithmetic to circumvent this and the PTA pass, > > > > > > > > > while tracking provenance through integers as well, does the > > > > > > > > > right thing with this. > > > > > > > > > > > > > > > > Great, that is much better for smtgcc than the semantics I have > > > > > > > > currently > > > > > > > > implemented! > > > > > > > > > > > > > > > > But it is not completely clear to me what "most of the C pointer > > > > > > > > restrictions" implies. Is the following a correct > > > > > > > > interpretation? > > > > > > > > > > > > > > > > 1. A pointer must contain a value that points into (or one > > > > > > > > past) an object > > > > > > > > corresponding to its provenance (where a pointer may have > > > > > > > > multiple > > > > > > > > provenances). Otherwise it invokes undefined behavior. > > > > > > > > > > > > > > > > 2. The provenance used for the result of POINTER_PLUS is the > > > > > > > > union of the > > > > > > > > provenances for the two arguments. > > > > > > > > > > > > > > Note that in C one argument would be an integer and there is no > > > > > > > provenance on integers in C as this can not work consistently. > > > > > > > > > > > > > > (and I think GCC gets this wrong) > > > > > > > > > > > > What GCC gets "right" (right in terms of improving optimization) is > > > > > > that (int *)(intptr_t)ptr has the same provenance as ptr. > > > > > > > > > > The problem is nobody could come up with a convincing model > > > > > for this that is sound. Currently GCC breaks the requirement > > > > > that roundtrips through integers have to work in all cases > > > > > because sometimes the compiler gets confused about the > > > > > provenance of the back-converted pointer and assigns the > > > > > wrong one. > > > > > > > > Does it? I don't remember such a case, can you point me to it? > > > > > > I think there are couple PRs related to this. One which > > > illustrates the underlying issue nicely is this one: > > > > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65752 > > > > > > #include <stdio.h> > > > #include <stdint.h> > > > #include <limits.h> > > > > > > int main() { > > > int x = 0, *p = 0; > > > for (uintptr_t i = 0; ; i++) { > > > if (i == (uintptr_t)&x) { p = (int*)i; break; } > > > } > > > *p = 15; > > > printf("%d\n", x); > > > } > > > > > > The loop is a no-op integer transformation that should set p to the > > > address of &x. But the compiler is not able to understand it and > > > forgets the provenance. The C standard requires that if you transform > > > the same integer back, you get the same pointer. So this is wrong. > > > > > > > > > If one tried to formulate consistent rules which make this example > > > have UB, then you would need to formulate some set rules that exactly > > > specifies how all possible operations on integers affect their > > > provenance, and this ruleset might then say that copying integers leads > > > to a loss of provenance. But this then also means in general that > > > all optimizations GCC does on integers would need to conform to > > > these rules about provenance, and not naively assume that integers > > > are just integers. As reasoning based on value equivalency goes > > > wrong already for pointers, I assume it is practically impossible to make > > > it work consistently for all integer operations. > > > > > > > > > Dropping provenance for integers will remove this problem completely > > > at the cost of some optimizations, and I think we should support a > > > correct mode at least as an option. This dropping of provenance could > > > also be done in the FE by inserting some __builtin or similar. > > > > > > > > > > > > > > > > > The model that *is* sound is to treat conversion to integer as > > > > > escaped and pointers converted back from integers as pointing > > > > > to any previously escaped provenance. > > > > > > > > > > LLVM also gets this wrong but my understanding is that they > > > > > want to fix this. > > > > > > > > > > > > > > > > > GCC considers literal zero to have "no" provenance (unless the > > > > > > target > > > > > > claims objects can exist at address zero). So (int > > > > > > *)((intptr_t)ptr + 0) > > > > > > retains the provenance of ptr. (int *)((intptr_t)ptr + 4) OTOH has > > > > > > provenance of ptr merged with 'nonlocal' provenance (a literal > > > > > > address > > > > > > never has provenance of a stack object). That is, constant folding > > > > > > loses the fact that (void *)0 + 4 would have "no" provenance > > > > > > (actually > > > > > > not sure whether the null "object" is subject to pointer arithmetic > > > > > > constraints). > > > > > > > > > > > > There's one thing GCC gets wrong (see some PRs) which is > > > > > > "conditional provenance". > > > > > > > > > > > > if (p == q) > > > > > > /* we now should treat p and q as having unioned provenance since > > > > > > p can be substituted for q (and vice versa) by the compiler. > > > > > > */ > > > > > > > > > > > > I have not yet seen a good answer to this from the C pointer > > > > > > provenance proposal > > > > > > folks. > > > > > > > > > > I am not sure what answer you want. This optimization is unsound. > > > > > > > > > > I think one could retain such optimizations by adding some additional > > > > > conditions that exclude the special case that p and q have different > > > > > provenance but the same address. > > > > > > > > I don't think this is workable for GCC. We'd have to disable all > > > > conditional copy propagation (in the GCC case for both pointers > > > > and integers, since the latter carry provenance). > > > > > > I think we should have a mode that does not carry provenance > > > via integers. This is also what LLVM will do as far as I know, > > > and also what the folks working on Rust semantics wanted once > > > I last talked to them. > > > > > > > > > Martin > > > > > > > > > > My other "simple" > > > > fix would be to make sure to unify provenances of p and q when > > > > there's an equality compare but even that's a bit difficult if you > > > > consider (SSA form) > > > > > > > > p_1 = p_2 + 1; > > > > if (p_1 == q_3) > > > > ... > > > > > > > > not only p_1 and q_3 would have to unify provenances but of course > > > > also p_2 and all other pointers based on (or related to) p_1. > > > > > > > > So I understand why you think that conditional copy propgation > > > > is "unsound", because it does not play well with provenances. > > > > > > > > Richard. > > > > > > > > > Martin > > > > > > > > > > > > > > > > > Richard. > > > > > > > > > > > > > > > > > > > > Martin > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 3. The POINTER_PLUS operation is UB if the calculation > > > > > > > > overflows and > > > > > > > > TYPE_OVERFLOW_WRAPS(ptr_type) is false. > > > > > > > > > > > > > > > > 4. The rules are the same for the calculations done in MEM_REF > > > > > > > > and > > > > > > > > TARGET_MEM_REF as for POINTER_PLUS. > > > > > > > > > > > > > > > > Question: For the TARGET_MEM_REF calculation: > > > > > > > > BASE + STEP * INDEX + INDEX2 + OFFSET > > > > > > > > Is it treated as one POINTER_PLUS, i.e. > > > > > > > > BASE + (STEP * INDEX + INDEX2 + OFFSET) > > > > > > > > or as two (i.e. do we care about overflow and OOB between the > > > > > > > > two index > > > > > > > > calculations)? > > > > > > > > > > > > > > > > > > > > > > > > FWIW, the vectorizer and ivopts do introduce pointers that are > > > > > > > > outside the > > > > > > > > object (which is why I allowed it in my current semantics)... > > > > > > > > > > > > > > > > /Krister
