This version is even nicer: https://godbolt.org/z/3M4Y6Pa3h
In both cases the compiler understands that the function is the
identity on integers and optimizes it to a move, but it still
makes an aliasing decision that is inconsistent with this basic
fact when this integer is back-converted to a pointer.
Martin
Am Freitag, dem 12.12.2025 um 18:23 +0100 schrieb Martin Uecker:
> Am Freitag, dem 12.12.2025 um 16:16 +0100 schrieb Richard Biener:
> > On Thu, Dec 11, 2025 at 8:33 PM Martin Uecker <[email protected]> wrote:
> > >
> > > Am Donnerstag, dem 11.12.2025 um 12:48 +0100 schrieb Richard Biener:
> > > > On Thu, Dec 11, 2025 at 7:48 AM Martin Uecker <[email protected]> wrote:
> > > > >
> > > > > Am Donnerstag, dem 11.12.2025 um 04:12 +0000 schrieb Krister
> > > > > Walfridsson via Gcc:
> > > > > > On Wed, 10 Dec 2025, Richard Biener wrote:
> > > > > >
> > > > > > > > The problem is that in GIMPLE, a pointer does not need to be in
> > > > > > > > bounds. The caller could call the function with a value of i
> > > > > > > > such that p + i happens to be equal to &a. So, as I understand
> > > > > > > > it, the GIMPLE semantics do not allow the pass to conclude that
> > > > > > > > p + i == &a is false, unless p + i is dereferenced (because
> > > > > > > > dereferencing a through p + i would be UB due to provenance).
> > > > > > >
> > > > > > > GIMPLE adopts most of the C pointer restrictions here thus we can
> > > > > > > (and do) conclude that pointers stay within an object when
> > > > > > > advanced. This is used by the PTA pass which results are used
> > > > > > > when we optimize your example. You have to divert to integer
> > > > > > > arithmetic to circumvent this and the PTA pass, while tracking
> > > > > > > provenance through integers as well, does the right thing with
> > > > > > > this.
> > > > > >
> > > > > > Great, that is much better for smtgcc than the semantics I have
> > > > > > currently
> > > > > > implemented!
> > > > > >
> > > > > > But it is not completely clear to me what "most of the C pointer
> > > > > > restrictions" implies. Is the following a correct interpretation?
> > > > > >
> > > > > > 1. A pointer must contain a value that points into (or one past) an
> > > > > > object
> > > > > > corresponding to its provenance (where a pointer may have multiple
> > > > > > provenances). Otherwise it invokes undefined behavior.
> > > > > >
> > > > > > 2. The provenance used for the result of POINTER_PLUS is the union
> > > > > > of the
> > > > > > provenances for the two arguments.
> > > > >
> > > > > Note that in C one argument would be an integer and there is no
> > > > > provenance on integers in C as this can not work consistently.
> > > > >
> > > > > (and I think GCC gets this wrong)
> > > >
> > > > What GCC gets "right" (right in terms of improving optimization) is
> > > > that (int *)(intptr_t)ptr has the same provenance as ptr.
> > >
> > > The problem is nobody could come up with a convincing model
> > > for this that is sound. Currently GCC breaks the requirement
> > > that roundtrips through integers have to work in all cases
> > > because sometimes the compiler gets confused about the
> > > provenance of the back-converted pointer and assigns the
> > > wrong one.
> >
> > Does it? I don't remember such a case, can you point me to it?
>
> I think there are couple PRs related to this. One which
> illustrates the underlying issue nicely is this one:
>
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65752
>
> #include <stdio.h>
> #include <stdint.h>
> #include <limits.h>
>
> int main() {
> int x = 0, *p = 0;
> for (uintptr_t i = 0; ; i++) {
> if (i == (uintptr_t)&x) { p = (int*)i; break; }
> }
> *p = 15;
> printf("%d\n", x);
> }
>
> The loop is a no-op integer transformation that should set p to the
> address of &x. But the compiler is not able to understand it and
> forgets the provenance. The C standard requires that if you transform
> the same integer back, you get the same pointer. So this is wrong.
>
>
> If one tried to formulate consistent rules which make this example
> have UB, then you would need to formulate some set rules that exactly
> specifies how all possible operations on integers affect their
> provenance, and this ruleset might then say that copying integers leads
> to a loss of provenance. But this then also means in general that
> all optimizations GCC does on integers would need to conform to
> these rules about provenance, and not naively assume that integers
> are just integers. As reasoning based on value equivalency goes
> wrong already for pointers, I assume it is practically impossible to make
> it work consistently for all integer operations.
>
>
> Dropping provenance for integers will remove this problem completely
> at the cost of some optimizations, and I think we should support a
> correct mode at least as an option. This dropping of provenance could
> also be done in the FE by inserting some __builtin or similar.
>
> >
> > >
> > > The model that *is* sound is to treat conversion to integer as
> > > escaped and pointers converted back from integers as pointing
> > > to any previously escaped provenance.
> > >
> > > LLVM also gets this wrong but my understanding is that they
> > > want to fix this.
> > >
> > > >
> > > > GCC considers literal zero to have "no" provenance (unless the target
> > > > claims objects can exist at address zero). So (int *)((intptr_t)ptr +
> > > > 0)
> > > > retains the provenance of ptr. (int *)((intptr_t)ptr + 4) OTOH has
> > > > provenance of ptr merged with 'nonlocal' provenance (a literal address
> > > > never has provenance of a stack object). That is, constant folding
> > > > loses the fact that (void *)0 + 4 would have "no" provenance (actually
> > > > not sure whether the null "object" is subject to pointer arithmetic
> > > > constraints).
> > > >
> > > > There's one thing GCC gets wrong (see some PRs) which is
> > > > "conditional provenance".
> > > >
> > > > if (p == q)
> > > > /* we now should treat p and q as having unioned provenance since
> > > > p can be substituted for q (and vice versa) by the compiler. */
> > > >
> > > > I have not yet seen a good answer to this from the C pointer provenance
> > > > proposal
> > > > folks.
> > >
> > > I am not sure what answer you want. This optimization is unsound.
> > >
> > > I think one could retain such optimizations by adding some additional
> > > conditions that exclude the special case that p and q have different
> > > provenance but the same address.
> >
> > I don't think this is workable for GCC. We'd have to disable all
> > conditional copy propagation (in the GCC case for both pointers
> > and integers, since the latter carry provenance).
>
> I think we should have a mode that does not carry provenance
> via integers. This is also what LLVM will do as far as I know,
> and also what the folks working on Rust semantics wanted once
> I last talked to them.
>
>
> Martin
>
>
> > My other "simple"
> > fix would be to make sure to unify provenances of p and q when
> > there's an equality compare but even that's a bit difficult if you
> > consider (SSA form)
> >
> > p_1 = p_2 + 1;
> > if (p_1 == q_3)
> > ...
> >
> > not only p_1 and q_3 would have to unify provenances but of course
> > also p_2 and all other pointers based on (or related to) p_1.
> >
> > So I understand why you think that conditional copy propgation
> > is "unsound", because it does not play well with provenances.
> >
> > Richard.
> >
> > > Martin
> > >
> > > >
> > > > Richard.
> > > >
> > > > >
> > > > > Martin
> > > > >
> > > > >
> > > > > >
> > > > > > 3. The POINTER_PLUS operation is UB if the calculation overflows and
> > > > > > TYPE_OVERFLOW_WRAPS(ptr_type) is false.
> > > > > >
> > > > > > 4. The rules are the same for the calculations done in MEM_REF and
> > > > > > TARGET_MEM_REF as for POINTER_PLUS.
> > > > > >
> > > > > > Question: For the TARGET_MEM_REF calculation:
> > > > > > BASE + STEP * INDEX + INDEX2 + OFFSET
> > > > > > Is it treated as one POINTER_PLUS, i.e.
> > > > > > BASE + (STEP * INDEX + INDEX2 + OFFSET)
> > > > > > or as two (i.e. do we care about overflow and OOB between the two
> > > > > > index
> > > > > > calculations)?
> > > > > >
> > > > > >
> > > > > > FWIW, the vectorizer and ivopts do introduce pointers that are
> > > > > > outside the
> > > > > > object (which is why I allowed it in my current semantics)...
> > > > > >
> > > > > > /Krister