Am Donnerstag, dem 11.12.2025 um 12:48 +0100 schrieb Richard Biener: > On Thu, Dec 11, 2025 at 7:48 AM Martin Uecker <[email protected]> wrote: > > > > Am Donnerstag, dem 11.12.2025 um 04:12 +0000 schrieb Krister Walfridsson > > via Gcc: > > > On Wed, 10 Dec 2025, Richard Biener wrote: > > > > > > > > The problem is that in GIMPLE, a pointer does not need to be in > > > > > bounds. The caller could call the function with a value of i such > > > > > that p + i happens to be equal to &a. So, as I understand it, the > > > > > GIMPLE semantics do not allow the pass to conclude that p + i == &a > > > > > is false, unless p + i is dereferenced (because dereferencing a > > > > > through p + i would be UB due to provenance). > > > > > > > > GIMPLE adopts most of the C pointer restrictions here thus we can (and > > > > do) conclude that pointers stay within an object when advanced. This > > > > is used by the PTA pass which results are used when we optimize your > > > > example. You have to divert to integer arithmetic to circumvent this > > > > and the PTA pass, while tracking provenance through integers as well, > > > > does the right thing with this. > > > > > > Great, that is much better for smtgcc than the semantics I have currently > > > implemented! > > > > > > But it is not completely clear to me what "most of the C pointer > > > restrictions" implies. Is the following a correct interpretation? > > > > > > 1. A pointer must contain a value that points into (or one past) an object > > > corresponding to its provenance (where a pointer may have multiple > > > provenances). Otherwise it invokes undefined behavior. > > > > > > 2. The provenance used for the result of POINTER_PLUS is the union of the > > > provenances for the two arguments. > > > > Note that in C one argument would be an integer and there is no > > provenance on integers in C as this can not work consistently. > > > > (and I think GCC gets this wrong) > > What GCC gets "right" (right in terms of improving optimization) is > that (int *)(intptr_t)ptr has the same provenance as ptr.
The problem is nobody could come up with a convincing model for this that is sound. Currently GCC breaks the requirement that roundtrips through integers have to work in all cases because sometimes the compiler gets confused about the provenance of the back-converted pointer and assigns the wrong one. The model that *is* sound is to treat conversion to integer as escaped and pointers converted back from integers as pointing to any previously escaped provenance. LLVM also gets this wrong but my understanding is that they want to fix this. > > GCC considers literal zero to have "no" provenance (unless the target > claims objects can exist at address zero). So (int *)((intptr_t)ptr + 0) > retains the provenance of ptr. (int *)((intptr_t)ptr + 4) OTOH has > provenance of ptr merged with 'nonlocal' provenance (a literal address > never has provenance of a stack object). That is, constant folding > loses the fact that (void *)0 + 4 would have "no" provenance (actually > not sure whether the null "object" is subject to pointer arithmetic > constraints). > > There's one thing GCC gets wrong (see some PRs) which is > "conditional provenance". > > if (p == q) > /* we now should treat p and q as having unioned provenance since > p can be substituted for q (and vice versa) by the compiler. */ > > I have not yet seen a good answer to this from the C pointer provenance > proposal > folks. I am not sure what answer you want. This optimization is unsound. I think one could retain such optimizations by adding some additional conditions that exclude the special case that p and q have different provenance but the same address. Martin > > Richard. > > > > > Martin > > > > > > > > > > 3. The POINTER_PLUS operation is UB if the calculation overflows and > > > TYPE_OVERFLOW_WRAPS(ptr_type) is false. > > > > > > 4. The rules are the same for the calculations done in MEM_REF and > > > TARGET_MEM_REF as for POINTER_PLUS. > > > > > > Question: For the TARGET_MEM_REF calculation: > > > BASE + STEP * INDEX + INDEX2 + OFFSET > > > Is it treated as one POINTER_PLUS, i.e. > > > BASE + (STEP * INDEX + INDEX2 + OFFSET) > > > or as two (i.e. do we care about overflow and OOB between the two index > > > calculations)? > > > > > > > > > FWIW, the vectorizer and ivopts do introduce pointers that are outside the > > > object (which is why I allowed it in my current semantics)... > > > > > > /Krister
