Am Samstag, dem 13.12.2025 um 11:25 +0100 schrieb Richard Biener:
> On Sat, Dec 13, 2025 at 11:23 AM Richard Biener
> <[email protected]> wrote:
> > 
> > On Fri, Dec 12, 2025 at 6:41 PM Martin Uecker <[email protected]> wrote:
> > > 
> > > 
> > > 
> > > This version is even nicer: https://godbolt.org/z/3M4Y6Pa3h
> > > 
> > > In both cases the compiler understands that the function is the
> > > identity on integers and optimizes it to a move, but it still
> > > makes an aliasing decision that is inconsistent with this basic
> > > fact when this integer is back-converted to a pointer.
> > 
> > Both show the issue is that we consider integers of unknown provenance
> > to only point to global variables, never to stack ones.  This "rule" is 
> > derived
> > from the idea of no code doing such thing but eventually having global
> > variables laid out at absolute addresses.  Conditional copy propagating of
> > (uintptr_t)&x to the assignment site would also avoid the issue since the
> > provenance is no longer transfered by an equivalence but an assignment.
> > That it works when retaining the function call (see below for your godbolt
> > case inline) shows that nonlocal properly handles escaped "integers".
> > But with inlining it degenerates.

Yes. It is clear that these are corner cases, but I am not sure
such things could never happen in real code.   There are also provenance
issues for reallocation at the same address.

But even if not, it is a conceptual problem that hinders efforts
in formal verification.

> > 
> > This is also easier to fix than dropping integer provenance tracking.  
> > Simply
> > by making 100 not have nonlocal but anything provenance (also at some cost).
> 
> That said, it should be reasonably easy to add a -fstrict-provenance flag.  
> But
> I'd rather have that adhere to whatever the C or C++ standard come up with
> rather than something we make up. 

The proposal currently favoured by WG14 is TS 6010.  I think we could
add a flag for this.   The idea with the TS is that get tried out 
so that it can still get modified for integration into the IS based
on the experience, such as complications while implementing it, serious
performance or usability issues.

Martin


>  There's always -fno-tree-pta.  As said,
> I view the conditional equivalence problem as the most practical one we
> currently have (although also only on fuzzed testcases at this point).

> 
> Richard.
> 
> > #include <stdio.h>
> > #include <stdint.h>
> > #include <limits.h>
> > 
> > #ifdef NOIPA
> > [[gnu::noipa]]
> > #endif
> > uintptr_t id(uintptr_t x)
> > {
> >   return x;
> >     uintptr_t i = 0;
> >     while (1)
> >         if (++i == x) break;
> >     return i;
> > }
> > 
> > int main()
> > {
> >     int x = 0;
> >     int *p = (int*)id((uintptr_t)&x);
> >     *p = 15;
> >     printf("%d\n", x);
> > }
> > 
> > IMO while academically interesting the loop case isn't of practical concern,
> > the conditional equivalence one is more so.
> > 
> > Btw, we can properly handle "pointer difference addressing" where you
> > construct a pointer to an object from the difference of two object pointers.
> > "Properly" as in, we can optimize this.
> > 
> >   int *i = malloc (4);
> >   int *j = malloc (4);
> >   ptrdiff_t diff = (uintptr_t)i - (uintptr_t)j;
> >   int *ip = (int *)((uintptr_t)j + diff);
> > 
> > This was once a common way of handling pointers in sysv shared memory from
> > different processes and IIRC this was important to optimize this use-case.  
> > The
> > other was from Matlab generated code which plumbed C <-> fortran by 
> > marshalling
> > 64bit C pointers through fortan routines by splitting into two halves
> > and passing as
> > double values.  So yes, we also track provenance through FP values.
> > 
> > That said, we arrived here by optimization needs plus handling code in the 
> > wild
> > correctly that's invalid with strict reading of the C standard (which
> > only allows
> > back-and-forth casting of pointer-to-integer of exactly the original
> > pointer value,
> > not any offsetted value).
> > 
> > Richard.
> > 
> > > Martin
> > > 
> > > Am Freitag, dem 12.12.2025 um 18:23 +0100 schrieb Martin Uecker:
> > > > Am Freitag, dem 12.12.2025 um 16:16 +0100 schrieb Richard Biener:
> > > > > On Thu, Dec 11, 2025 at 8:33 PM Martin Uecker <[email protected]> wrote:
> > > > > > 
> > > > > > Am Donnerstag, dem 11.12.2025 um 12:48 +0100 schrieb Richard Biener:
> > > > > > > On Thu, Dec 11, 2025 at 7:48 AM Martin Uecker <[email protected]> 
> > > > > > > wrote:
> > > > > > > > 
> > > > > > > > Am Donnerstag, dem 11.12.2025 um 04:12 +0000 schrieb Krister 
> > > > > > > > Walfridsson via Gcc:
> > > > > > > > > On Wed, 10 Dec 2025, Richard Biener wrote:
> > > > > > > > > 
> > > > > > > > > > > The problem is that in GIMPLE, a pointer does not need to 
> > > > > > > > > > > be in bounds. The caller could call the function with a 
> > > > > > > > > > > value of i such that p + i happens to be equal to &a. So, 
> > > > > > > > > > > as I understand it, the GIMPLE semantics do not allow the 
> > > > > > > > > > > pass to conclude that p + i == &a is false, unless p + i 
> > > > > > > > > > > is dereferenced (because dereferencing a through p + i 
> > > > > > > > > > > would be UB due to provenance).
> > > > > > > > > > 
> > > > > > > > > > GIMPLE adopts most of the C pointer restrictions here thus 
> > > > > > > > > > we can (and do) conclude that pointers stay within an 
> > > > > > > > > > object when advanced.  This is used by the PTA pass which 
> > > > > > > > > > results are used when we optimize your example.  You have 
> > > > > > > > > > to divert to integer arithmetic to circumvent this and the 
> > > > > > > > > > PTA pass, while tracking provenance through integers as 
> > > > > > > > > > well, does the right thing with this.
> > > > > > > > > 
> > > > > > > > > Great, that is much better for smtgcc than the semantics I 
> > > > > > > > > have currently
> > > > > > > > > implemented!
> > > > > > > > > 
> > > > > > > > > But it is not completely clear to me what "most of the C 
> > > > > > > > > pointer
> > > > > > > > > restrictions" implies. Is the following a correct 
> > > > > > > > > interpretation?
> > > > > > > > > 
> > > > > > > > > 1. A pointer must contain a value that points into (or one 
> > > > > > > > > past) an object
> > > > > > > > > corresponding to its provenance (where a pointer may have 
> > > > > > > > > multiple
> > > > > > > > > provenances). Otherwise it invokes undefined behavior.
> > > > > > > > > 
> > > > > > > > > 2. The provenance used for the result of POINTER_PLUS is the 
> > > > > > > > > union of the
> > > > > > > > > provenances for the two arguments.
> > > > > > > > 
> > > > > > > > Note that in C one argument would be an integer and there is no
> > > > > > > > provenance on integers in C as this can not work consistently.
> > > > > > > > 
> > > > > > > > (and I think GCC gets this wrong)
> > > > > > > 
> > > > > > > What GCC gets "right" (right in terms of improving optimization) 
> > > > > > > is
> > > > > > > that (int *)(intptr_t)ptr has the same provenance as ptr.
> > > > > > 
> > > > > > The problem is nobody could come up with a convincing model
> > > > > > for this that is sound.  Currently GCC breaks the requirement
> > > > > > that roundtrips through integers have to work in all cases
> > > > > > because sometimes the compiler gets confused about the
> > > > > > provenance of the back-converted pointer and assigns the
> > > > > > wrong one.
> > > > > 
> > > > > Does it?  I don't remember such a case, can you point me to it?
> > > > 
> > > > I think there are couple PRs related to this. One which
> > > > illustrates the underlying issue nicely is this one:
> > > > 
> > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65752
> > > > 
> > > > #include <stdio.h>
> > > > #include <stdint.h>
> > > > #include <limits.h>
> > > > 
> > > > int main() {
> > > >   int x = 0, *p = 0;
> > > >   for (uintptr_t i = 0; ; i++) {
> > > >     if (i == (uintptr_t)&x) { p = (int*)i; break; }
> > > >   }
> > > >   *p = 15;
> > > >   printf("%d\n", x);
> > > > }
> > > > 
> > > > The loop is a no-op integer transformation that should set p to the
> > > > address of &x.  But the compiler is not able to understand it and
> > > > forgets the provenance.  The C standard requires that if you transform
> > > > the same integer back, you get the same pointer.  So this is wrong.
> > > > 
> > > > 
> > > > If one tried to formulate consistent rules which make this example
> > > > have UB, then you would need to formulate some set rules that exactly
> > > > specifies how all possible operations on integers affect their
> > > > provenance, and this ruleset might then say that copying integers leads
> > > > to a loss of provenance.  But this then also means in general that
> > > > all optimizations GCC does on integers would need to conform to
> > > > these rules about provenance, and not naively assume that integers
> > > > are just integers.   As reasoning based on value equivalency goes
> > > > wrong already for pointers, I assume it is practically impossible to 
> > > > make
> > > > it work consistently for all integer operations.
> > > > 
> > > > 
> > > > Dropping provenance for integers will remove this problem completely
> > > > at the cost of some optimizations, and I think we should support a
> > > > correct mode at least as an option.   This dropping of provenance could
> > > > also be done in the FE by inserting some __builtin or similar.
> > > > 
> > > > > 
> > > > > > 
> > > > > > The model that *is* sound is to treat conversion to integer as
> > > > > > escaped and pointers converted back from integers as pointing
> > > > > > to any previously escaped provenance.
> > > > > > 
> > > > > > LLVM also gets this wrong but my understanding is that they
> > > > > > want to fix this.
> > > > > > 
> > > > > > > 
> > > > > > > GCC considers literal zero to have "no" provenance (unless the 
> > > > > > > target
> > > > > > > claims objects can exist at address zero).  So (int 
> > > > > > > *)((intptr_t)ptr + 0)
> > > > > > > retains the provenance of ptr.  (int *)((intptr_t)ptr + 4) OTOH 
> > > > > > > has
> > > > > > > provenance of ptr merged with 'nonlocal' provenance (a literal 
> > > > > > > address
> > > > > > > never has provenance of a stack object).  That is, constant 
> > > > > > > folding
> > > > > > > loses the fact that (void *)0 + 4 would have "no" provenance 
> > > > > > > (actually
> > > > > > > not sure whether the null "object" is subject to pointer 
> > > > > > > arithmetic
> > > > > > > constraints).
> > > > > > > 
> > > > > > > There's one thing GCC gets wrong (see some PRs) which is
> > > > > > > "conditional provenance".
> > > > > > > 
> > > > > > >  if (p == q)
> > > > > > >    /* we now should treat p and q as having unioned provenance 
> > > > > > > since
> > > > > > >       p can be substituted for q (and vice versa) by the 
> > > > > > > compiler.  */
> > > > > > > 
> > > > > > > I have not yet seen a good answer to this from the C pointer 
> > > > > > > provenance proposal
> > > > > > > folks.
> > > > > > 
> > > > > > I am not sure what answer you want. This optimization is unsound.
> > > > > > 
> > > > > > I think one could retain such optimizations by adding some 
> > > > > > additional
> > > > > > conditions that exclude the special case that p and q have different
> > > > > > provenance but the same address.
> > > > > 
> > > > > I don't think this is workable for GCC.  We'd have to disable all
> > > > > conditional copy propagation (in the GCC case for both pointers
> > > > > and integers, since the latter carry provenance).
> > > > 
> > > > I think we should have a mode that does not carry provenance
> > > > via integers.  This is also what LLVM will do as far as I know,
> > > > and also what the folks working on Rust semantics wanted once
> > > > I last talked to them.
> > > > 
> > > > 
> > > > Martin
> > > > 
> > > > 
> > > > > My other "simple"
> > > > > fix would be to make sure to unify provenances of p and q when
> > > > > there's an equality compare but even that's a bit difficult if you
> > > > > consider (SSA form)
> > > > > 
> > > > >   p_1 = p_2 + 1;
> > > > >   if (p_1 == q_3)
> > > > >     ...
> > > > > 
> > > > > not only p_1 and q_3 would have to unify provenances but of course
> > > > > also p_2 and all other pointers based on (or related to) p_1.
> > > > > 
> > > > > So I understand why you think that conditional copy propgation
> > > > > is "unsound", because it does not play well with provenances.
> > > > > 
> > > > > Richard.
> > > > > 
> > > > > > Martin
> > > > > > 
> > > > > > > 
> > > > > > > Richard.
> > > > > > > 
> > > > > > > > 
> > > > > > > > Martin
> > > > > > > > 
> > > > > > > > 
> > > > > > > > > 
> > > > > > > > > 3. The POINTER_PLUS operation is UB if the calculation 
> > > > > > > > > overflows and
> > > > > > > > > TYPE_OVERFLOW_WRAPS(ptr_type) is false.
> > > > > > > > > 
> > > > > > > > > 4. The rules are the same for the calculations done in 
> > > > > > > > > MEM_REF and
> > > > > > > > > TARGET_MEM_REF as for POINTER_PLUS.
> > > > > > > > > 
> > > > > > > > > Question: For the TARGET_MEM_REF calculation:
> > > > > > > > >    BASE + STEP * INDEX + INDEX2 + OFFSET
> > > > > > > > > Is it treated as one POINTER_PLUS, i.e.
> > > > > > > > >    BASE + (STEP * INDEX + INDEX2 + OFFSET)
> > > > > > > > > or as two (i.e. do we care about overflow and OOB between the 
> > > > > > > > > two index
> > > > > > > > > calculations)?
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > FWIW, the vectorizer and ivopts do introduce pointers that 
> > > > > > > > > are outside the
> > > > > > > > > object (which is why I allowed it in my current semantics)...
> > > > > > > > > 
> > > > > > > > >     /Krister

Reply via email to