On Sat, Dec 13, 2025 at 11:23 AM Richard Biener
<[email protected]> wrote:
>
> On Fri, Dec 12, 2025 at 6:41 PM Martin Uecker <[email protected]> wrote:
> >
> >
> >
> > This version is even nicer: https://godbolt.org/z/3M4Y6Pa3h
> >
> > In both cases the compiler understands that the function is the
> > identity on integers and optimizes it to a move, but it still
> > makes an aliasing decision that is inconsistent with this basic
> > fact when this integer is back-converted to a pointer.
>
> Both show the issue is that we consider integers of unknown provenance
> to only point to global variables, never to stack ones. This "rule" is
> derived
> from the idea of no code doing such thing but eventually having global
> variables laid out at absolute addresses. Conditional copy propagating of
> (uintptr_t)&x to the assignment site would also avoid the issue since the
> provenance is no longer transfered by an equivalence but an assignment.
> That it works when retaining the function call (see below for your godbolt
> case inline) shows that nonlocal properly handles escaped "integers".
> But with inlining it degenerates.
>
> This is also easier to fix than dropping integer provenance tracking. Simply
> by making 100 not have nonlocal but anything provenance (also at some cost).
That said, it should be reasonably easy to add a -fstrict-provenance flag. But
I'd rather have that adhere to whatever the C or C++ standard come up with
rather than something we make up. There's always -fno-tree-pta. As said,
I view the conditional equivalence problem as the most practical one we
currently have (although also only on fuzzed testcases at this point).
Richard.
> #include <stdio.h>
> #include <stdint.h>
> #include <limits.h>
>
> #ifdef NOIPA
> [[gnu::noipa]]
> #endif
> uintptr_t id(uintptr_t x)
> {
> return x;
> uintptr_t i = 0;
> while (1)
> if (++i == x) break;
> return i;
> }
>
> int main()
> {
> int x = 0;
> int *p = (int*)id((uintptr_t)&x);
> *p = 15;
> printf("%d\n", x);
> }
>
> IMO while academically interesting the loop case isn't of practical concern,
> the conditional equivalence one is more so.
>
> Btw, we can properly handle "pointer difference addressing" where you
> construct a pointer to an object from the difference of two object pointers.
> "Properly" as in, we can optimize this.
>
> int *i = malloc (4);
> int *j = malloc (4);
> ptrdiff_t diff = (uintptr_t)i - (uintptr_t)j;
> int *ip = (int *)((uintptr_t)j + diff);
>
> This was once a common way of handling pointers in sysv shared memory from
> different processes and IIRC this was important to optimize this use-case.
> The
> other was from Matlab generated code which plumbed C <-> fortran by
> marshalling
> 64bit C pointers through fortan routines by splitting into two halves
> and passing as
> double values. So yes, we also track provenance through FP values.
>
> That said, we arrived here by optimization needs plus handling code in the
> wild
> correctly that's invalid with strict reading of the C standard (which
> only allows
> back-and-forth casting of pointer-to-integer of exactly the original
> pointer value,
> not any offsetted value).
>
> Richard.
>
> > Martin
> >
> > Am Freitag, dem 12.12.2025 um 18:23 +0100 schrieb Martin Uecker:
> > > Am Freitag, dem 12.12.2025 um 16:16 +0100 schrieb Richard Biener:
> > > > On Thu, Dec 11, 2025 at 8:33 PM Martin Uecker <[email protected]> wrote:
> > > > >
> > > > > Am Donnerstag, dem 11.12.2025 um 12:48 +0100 schrieb Richard Biener:
> > > > > > On Thu, Dec 11, 2025 at 7:48 AM Martin Uecker <[email protected]>
> > > > > > wrote:
> > > > > > >
> > > > > > > Am Donnerstag, dem 11.12.2025 um 04:12 +0000 schrieb Krister
> > > > > > > Walfridsson via Gcc:
> > > > > > > > On Wed, 10 Dec 2025, Richard Biener wrote:
> > > > > > > >
> > > > > > > > > > The problem is that in GIMPLE, a pointer does not need to
> > > > > > > > > > be in bounds. The caller could call the function with a
> > > > > > > > > > value of i such that p + i happens to be equal to &a. So,
> > > > > > > > > > as I understand it, the GIMPLE semantics do not allow the
> > > > > > > > > > pass to conclude that p + i == &a is false, unless p + i is
> > > > > > > > > > dereferenced (because dereferencing a through p + i would
> > > > > > > > > > be UB due to provenance).
> > > > > > > > >
> > > > > > > > > GIMPLE adopts most of the C pointer restrictions here thus we
> > > > > > > > > can (and do) conclude that pointers stay within an object
> > > > > > > > > when advanced. This is used by the PTA pass which results
> > > > > > > > > are used when we optimize your example. You have to divert
> > > > > > > > > to integer arithmetic to circumvent this and the PTA pass,
> > > > > > > > > while tracking provenance through integers as well, does the
> > > > > > > > > right thing with this.
> > > > > > > >
> > > > > > > > Great, that is much better for smtgcc than the semantics I have
> > > > > > > > currently
> > > > > > > > implemented!
> > > > > > > >
> > > > > > > > But it is not completely clear to me what "most of the C pointer
> > > > > > > > restrictions" implies. Is the following a correct
> > > > > > > > interpretation?
> > > > > > > >
> > > > > > > > 1. A pointer must contain a value that points into (or one
> > > > > > > > past) an object
> > > > > > > > corresponding to its provenance (where a pointer may have
> > > > > > > > multiple
> > > > > > > > provenances). Otherwise it invokes undefined behavior.
> > > > > > > >
> > > > > > > > 2. The provenance used for the result of POINTER_PLUS is the
> > > > > > > > union of the
> > > > > > > > provenances for the two arguments.
> > > > > > >
> > > > > > > Note that in C one argument would be an integer and there is no
> > > > > > > provenance on integers in C as this can not work consistently.
> > > > > > >
> > > > > > > (and I think GCC gets this wrong)
> > > > > >
> > > > > > What GCC gets "right" (right in terms of improving optimization) is
> > > > > > that (int *)(intptr_t)ptr has the same provenance as ptr.
> > > > >
> > > > > The problem is nobody could come up with a convincing model
> > > > > for this that is sound. Currently GCC breaks the requirement
> > > > > that roundtrips through integers have to work in all cases
> > > > > because sometimes the compiler gets confused about the
> > > > > provenance of the back-converted pointer and assigns the
> > > > > wrong one.
> > > >
> > > > Does it? I don't remember such a case, can you point me to it?
> > >
> > > I think there are couple PRs related to this. One which
> > > illustrates the underlying issue nicely is this one:
> > >
> > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65752
> > >
> > > #include <stdio.h>
> > > #include <stdint.h>
> > > #include <limits.h>
> > >
> > > int main() {
> > > int x = 0, *p = 0;
> > > for (uintptr_t i = 0; ; i++) {
> > > if (i == (uintptr_t)&x) { p = (int*)i; break; }
> > > }
> > > *p = 15;
> > > printf("%d\n", x);
> > > }
> > >
> > > The loop is a no-op integer transformation that should set p to the
> > > address of &x. But the compiler is not able to understand it and
> > > forgets the provenance. The C standard requires that if you transform
> > > the same integer back, you get the same pointer. So this is wrong.
> > >
> > >
> > > If one tried to formulate consistent rules which make this example
> > > have UB, then you would need to formulate some set rules that exactly
> > > specifies how all possible operations on integers affect their
> > > provenance, and this ruleset might then say that copying integers leads
> > > to a loss of provenance. But this then also means in general that
> > > all optimizations GCC does on integers would need to conform to
> > > these rules about provenance, and not naively assume that integers
> > > are just integers. As reasoning based on value equivalency goes
> > > wrong already for pointers, I assume it is practically impossible to make
> > > it work consistently for all integer operations.
> > >
> > >
> > > Dropping provenance for integers will remove this problem completely
> > > at the cost of some optimizations, and I think we should support a
> > > correct mode at least as an option. This dropping of provenance could
> > > also be done in the FE by inserting some __builtin or similar.
> > >
> > > >
> > > > >
> > > > > The model that *is* sound is to treat conversion to integer as
> > > > > escaped and pointers converted back from integers as pointing
> > > > > to any previously escaped provenance.
> > > > >
> > > > > LLVM also gets this wrong but my understanding is that they
> > > > > want to fix this.
> > > > >
> > > > > >
> > > > > > GCC considers literal zero to have "no" provenance (unless the
> > > > > > target
> > > > > > claims objects can exist at address zero). So (int
> > > > > > *)((intptr_t)ptr + 0)
> > > > > > retains the provenance of ptr. (int *)((intptr_t)ptr + 4) OTOH has
> > > > > > provenance of ptr merged with 'nonlocal' provenance (a literal
> > > > > > address
> > > > > > never has provenance of a stack object). That is, constant folding
> > > > > > loses the fact that (void *)0 + 4 would have "no" provenance
> > > > > > (actually
> > > > > > not sure whether the null "object" is subject to pointer arithmetic
> > > > > > constraints).
> > > > > >
> > > > > > There's one thing GCC gets wrong (see some PRs) which is
> > > > > > "conditional provenance".
> > > > > >
> > > > > > if (p == q)
> > > > > > /* we now should treat p and q as having unioned provenance since
> > > > > > p can be substituted for q (and vice versa) by the compiler.
> > > > > > */
> > > > > >
> > > > > > I have not yet seen a good answer to this from the C pointer
> > > > > > provenance proposal
> > > > > > folks.
> > > > >
> > > > > I am not sure what answer you want. This optimization is unsound.
> > > > >
> > > > > I think one could retain such optimizations by adding some additional
> > > > > conditions that exclude the special case that p and q have different
> > > > > provenance but the same address.
> > > >
> > > > I don't think this is workable for GCC. We'd have to disable all
> > > > conditional copy propagation (in the GCC case for both pointers
> > > > and integers, since the latter carry provenance).
> > >
> > > I think we should have a mode that does not carry provenance
> > > via integers. This is also what LLVM will do as far as I know,
> > > and also what the folks working on Rust semantics wanted once
> > > I last talked to them.
> > >
> > >
> > > Martin
> > >
> > >
> > > > My other "simple"
> > > > fix would be to make sure to unify provenances of p and q when
> > > > there's an equality compare but even that's a bit difficult if you
> > > > consider (SSA form)
> > > >
> > > > p_1 = p_2 + 1;
> > > > if (p_1 == q_3)
> > > > ...
> > > >
> > > > not only p_1 and q_3 would have to unify provenances but of course
> > > > also p_2 and all other pointers based on (or related to) p_1.
> > > >
> > > > So I understand why you think that conditional copy propgation
> > > > is "unsound", because it does not play well with provenances.
> > > >
> > > > Richard.
> > > >
> > > > > Martin
> > > > >
> > > > > >
> > > > > > Richard.
> > > > > >
> > > > > > >
> > > > > > > Martin
> > > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > 3. The POINTER_PLUS operation is UB if the calculation
> > > > > > > > overflows and
> > > > > > > > TYPE_OVERFLOW_WRAPS(ptr_type) is false.
> > > > > > > >
> > > > > > > > 4. The rules are the same for the calculations done in MEM_REF
> > > > > > > > and
> > > > > > > > TARGET_MEM_REF as for POINTER_PLUS.
> > > > > > > >
> > > > > > > > Question: For the TARGET_MEM_REF calculation:
> > > > > > > > BASE + STEP * INDEX + INDEX2 + OFFSET
> > > > > > > > Is it treated as one POINTER_PLUS, i.e.
> > > > > > > > BASE + (STEP * INDEX + INDEX2 + OFFSET)
> > > > > > > > or as two (i.e. do we care about overflow and OOB between the
> > > > > > > > two index
> > > > > > > > calculations)?
> > > > > > > >
> > > > > > > >
> > > > > > > > FWIW, the vectorizer and ivopts do introduce pointers that are
> > > > > > > > outside the
> > > > > > > > object (which is why I allowed it in my current semantics)...
> > > > > > > >
> > > > > > > > /Krister