Am Freitag, dem 12.12.2025 um 16:16 +0100 schrieb Richard Biener:
> On Thu, Dec 11, 2025 at 8:33 PM Martin Uecker <[email protected]> wrote:
> > 
> > Am Donnerstag, dem 11.12.2025 um 12:48 +0100 schrieb Richard Biener:
> > > On Thu, Dec 11, 2025 at 7:48 AM Martin Uecker <[email protected]> wrote:
> > > > 
> > > > Am Donnerstag, dem 11.12.2025 um 04:12 +0000 schrieb Krister 
> > > > Walfridsson via Gcc:
> > > > > On Wed, 10 Dec 2025, Richard Biener wrote:
> > > > > 
> > > > > > > The problem is that in GIMPLE, a pointer does not need to be in 
> > > > > > > bounds. The caller could call the function with a value of i such 
> > > > > > > that p + i happens to be equal to &a. So, as I understand it, the 
> > > > > > > GIMPLE semantics do not allow the pass to conclude that p + i == 
> > > > > > > &a is false, unless p + i is dereferenced (because dereferencing 
> > > > > > > a through p + i would be UB due to provenance).
> > > > > > 
> > > > > > GIMPLE adopts most of the C pointer restrictions here thus we can 
> > > > > > (and do) conclude that pointers stay within an object when 
> > > > > > advanced.  This is used by the PTA pass which results are used when 
> > > > > > we optimize your example.  You have to divert to integer arithmetic 
> > > > > > to circumvent this and the PTA pass, while tracking provenance 
> > > > > > through integers as well, does the right thing with this.
> > > > > 
> > > > > Great, that is much better for smtgcc than the semantics I have 
> > > > > currently
> > > > > implemented!
> > > > > 
> > > > > But it is not completely clear to me what "most of the C pointer
> > > > > restrictions" implies. Is the following a correct interpretation?
> > > > > 
> > > > > 1. A pointer must contain a value that points into (or one past) an 
> > > > > object
> > > > > corresponding to its provenance (where a pointer may have multiple
> > > > > provenances). Otherwise it invokes undefined behavior.
> > > > > 
> > > > > 2. The provenance used for the result of POINTER_PLUS is the union of 
> > > > > the
> > > > > provenances for the two arguments.
> > > > 
> > > > Note that in C one argument would be an integer and there is no
> > > > provenance on integers in C as this can not work consistently.
> > > > 
> > > > (and I think GCC gets this wrong)
> > > 
> > > What GCC gets "right" (right in terms of improving optimization) is
> > > that (int *)(intptr_t)ptr has the same provenance as ptr.
> > 
> > The problem is nobody could come up with a convincing model
> > for this that is sound.  Currently GCC breaks the requirement
> > that roundtrips through integers have to work in all cases
> > because sometimes the compiler gets confused about the
> > provenance of the back-converted pointer and assigns the
> > wrong one.
> 
> Does it?  I don't remember such a case, can you point me to it?

I think there are couple PRs related to this. One which
illustrates the underlying issue nicely is this one:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65752

#include <stdio.h>
#include <stdint.h>
#include <limits.h>

int main() {
  int x = 0, *p = 0;
  for (uintptr_t i = 0; ; i++) {
    if (i == (uintptr_t)&x) { p = (int*)i; break; }
  }
  *p = 15;
  printf("%d\n", x);
}

The loop is a no-op integer transformation that should set p to the
address of &x.  But the compiler is not able to understand it and
forgets the provenance.  The C standard requires that if you transform
the same integer back, you get the same pointer.  So this is wrong.


If one tried to formulate consistent rules which make this example
have UB, then you would need to formulate some set rules that exactly
specifies how all possible operations on integers affect their
provenance, and this ruleset might then say that copying integers leads
to a loss of provenance.  But this then also means in general that
all optimizations GCC does on integers would need to conform to
these rules about provenance, and not naively assume that integers
are just integers.   As reasoning based on value equivalency goes
wrong already for pointers, I assume it is practically impossible to make
it work consistently for all integer operations.


Dropping provenance for integers will remove this problem completely
at the cost of some optimizations, and I think we should support a
correct mode at least as an option.   This dropping of provenance could
also be done in the FE by inserting some __builtin or similar.

> 
> > 
> > The model that *is* sound is to treat conversion to integer as
> > escaped and pointers converted back from integers as pointing
> > to any previously escaped provenance.
> > 
> > LLVM also gets this wrong but my understanding is that they
> > want to fix this.
> > 
> > > 
> > > GCC considers literal zero to have "no" provenance (unless the target
> > > claims objects can exist at address zero).  So (int *)((intptr_t)ptr + 0)
> > > retains the provenance of ptr.  (int *)((intptr_t)ptr + 4) OTOH has
> > > provenance of ptr merged with 'nonlocal' provenance (a literal address
> > > never has provenance of a stack object).  That is, constant folding
> > > loses the fact that (void *)0 + 4 would have "no" provenance (actually
> > > not sure whether the null "object" is subject to pointer arithmetic
> > > constraints).
> > > 
> > > There's one thing GCC gets wrong (see some PRs) which is
> > > "conditional provenance".
> > > 
> > >  if (p == q)
> > >    /* we now should treat p and q as having unioned provenance since
> > >       p can be substituted for q (and vice versa) by the compiler.  */
> > > 
> > > I have not yet seen a good answer to this from the C pointer provenance 
> > > proposal
> > > folks.
> > 
> > I am not sure what answer you want. This optimization is unsound.
> > 
> > I think one could retain such optimizations by adding some additional
> > conditions that exclude the special case that p and q have different
> > provenance but the same address.
> 
> I don't think this is workable for GCC.  We'd have to disable all
> conditional copy propagation (in the GCC case for both pointers
> and integers, since the latter carry provenance).  

I think we should have a mode that does not carry provenance
via integers.  This is also what LLVM will do as far as I know,
and also what the folks working on Rust semantics wanted once
I last talked to them.


Martin


> My other "simple"
> fix would be to make sure to unify provenances of p and q when
> there's an equality compare but even that's a bit difficult if you
> consider (SSA form)
> 
>   p_1 = p_2 + 1;
>   if (p_1 == q_3)
>     ...
> 
> not only p_1 and q_3 would have to unify provenances but of course
> also p_2 and all other pointers based on (or related to) p_1.
> 
> So I understand why you think that conditional copy propgation
> is "unsound", because it does not play well with provenances.
> 
> Richard.
> 
> > Martin
> > 
> > > 
> > > Richard.
> > > 
> > > > 
> > > > Martin
> > > > 
> > > > 
> > > > > 
> > > > > 3. The POINTER_PLUS operation is UB if the calculation overflows and
> > > > > TYPE_OVERFLOW_WRAPS(ptr_type) is false.
> > > > > 
> > > > > 4. The rules are the same for the calculations done in MEM_REF and
> > > > > TARGET_MEM_REF as for POINTER_PLUS.
> > > > > 
> > > > > Question: For the TARGET_MEM_REF calculation:
> > > > >    BASE + STEP * INDEX + INDEX2 + OFFSET
> > > > > Is it treated as one POINTER_PLUS, i.e.
> > > > >    BASE + (STEP * INDEX + INDEX2 + OFFSET)
> > > > > or as two (i.e. do we care about overflow and OOB between the two 
> > > > > index
> > > > > calculations)?
> > > > > 
> > > > > 
> > > > > FWIW, the vectorizer and ivopts do introduce pointers that are 
> > > > > outside the
> > > > > object (which is why I allowed it in my current semantics)...
> > > > > 
> > > > >     /Krister

Reply via email to