Am Freitag, dem 12.12.2025 um 18:50 +0700 schrieb Jason Merrill: > On 12/12/25 2:10 PM, Martin Uecker wrote: > > Am Freitag, dem 12.12.2025 um 09:21 +0700 schrieb Jason Merrill via Gcc: > > > On Fri, Dec 12, 2025, 8:57 a.m. Krister Walfridsson via Gcc > > > <[email protected]> > > > wrote: > > > > > > > On Thu, 11 Dec 2025, Richard Biener wrote: > > > > > > > > > Date: Thu, 11 Dec 2025 12:38:43 +0100 > > > > > From: Richard Biener <[email protected]> > > > > > To: Krister Walfridsson <[email protected]> > > > > > Cc: [email protected] > > > > > Subject: Re: pointer comparison in GIMPLE > > > > > > > > > > On Thu, Dec 11, 2025 at 5:12 AM Krister Walfridsson > > > > > <[email protected]> wrote: > > > > > > > > > > > > On Wed, 10 Dec 2025, Richard Biener wrote: > > > > > > > > > > > > > > The problem is that in GIMPLE, a pointer does not need to be in > > > > bounds. The caller could call the function with a value of i such that > > > > p + > > > > i happens to be equal to &a. So, as I understand it, the GIMPLE > > > > semantics > > > > do not allow the pass to conclude that p + i == &a is false, unless p + > > > > i > > > > is dereferenced (because dereferencing a through p + i would be UB due > > > > to > > > > provenance). > > > > > > > > > > > > > > GIMPLE adopts most of the C pointer restrictions here thus we can > > > > > > > (and > > > > do) conclude that pointers stay within an object when advanced. This is > > > > used by the PTA pass which results are used when we optimize your > > > > example. > > > > You have to divert to integer arithmetic to circumvent this and the PTA > > > > pass, while tracking provenance through integers as well, does the right > > > > thing with this. > > > > > > > > > > > > Great, that is much better for smtgcc than the semantics I have > > > > currently > > > > > > implemented! > > > > > > > > > > > > But it is not completely clear to me what "most of the C pointer > > > > > > restrictions" implies. Is the following a correct interpretation? > > > > > > > > > > > > 1. A pointer must contain a value that points into (or one past) an > > > > object > > > > > > corresponding to its provenance (where a pointer may have multiple > > > > > > provenances). Otherwise it invokes undefined behavior. > > > > > > > > > > Hmm. I think it's only UB when you'd "use" that pointer. That is, > > > > > PTA > > > > would > > > > > compute the points-to set to 'nothing'. The immediate consequences > > > > > are > > > > such > > > > > pointer isn't equal to any other pointer and accesses through it alias > > > > > with nothing, > > > > > stores would be DSEd. But at the point a SSA var is assigned such a > > > > pointer > > > > > we couldn't place a trap() (?) > > > > > > > > > > > 2. The provenance used for the result of POINTER_PLUS is the union > > > > > > of > > > > the > > > > > > provenances for the two arguments. > > > > > > > > > > For POINTER_PLUS it's the provenance of the first argument. > > > > > > > > > > For PLUS_EXPR it is the union of both arguments. For > > > > > POINTER_DIFF_EXPR > > > > > the result has no provenance. > > > > > > > > > > > 3. The POINTER_PLUS operation is UB if the calculation overflows and > > > > > > TYPE_OVERFLOW_WRAPS(ptr_type) is false. > > > > > > > > > > Yes. > > > > > > > > > > > 4. The rules are the same for the calculations done in MEM_REF and > > > > > > TARGET_MEM_REF as for POINTER_PLUS. > > > > > > > > > > Yes. > > > > > > > > > > > Question: For the TARGET_MEM_REF calculation: > > > > > > BASE + STEP * INDEX + INDEX2 + OFFSET > > > > > > Is it treated as one POINTER_PLUS, i.e. > > > > > > BASE + (STEP * INDEX + INDEX2 + OFFSET) > > > > > > or as two (i.e. do we care about overflow and OOB between the two > > > > > > index > > > > > > calculations)? > > > > > > > > > > I'd say it counts as one pointer + offset calculation with all the > > > > > offset > > > > > calculation being done in wrapping operations. > > > > > > > > Your answers match exactly what is currently implemented in smtgcc, so I > > > > am still thinking this is a bug in GCC (or that there is some missing > > > > GIMPLE rule I must implement). > > > > > > > > The original program looks in GIMPLE like: > > > > > > > > void foo (char * p, long long int i) > > > > { > > > > char a; > > > > sizetype i.0_1; > > > > char * _2; > > > > > > > > <bb 2> : > > > > i.0_1 = (sizetype) i_3(D); > > > > _2 = p_4(D) + i.0_1; > > > > if (_2 == &a) > > > > goto <bb 3>; > > > > else > > > > goto <bb 4>; > > > > > > > > <bb 3> : > > > > __builtin_abort (); > > > > > > > > <bb 4> : > > > > a ={v} {CLOBBER(eos)}; > > > > return; > > > > } > > > > > > > > Assume, for the sake of argument, that the address of a is 0x2000000, p > > > > = > > > > 0x1000000 and i = 0x1000000. > > > > > > > > With the semantics as described in this mail thread, all operations are > > > > defined: > > > > * _2 evaluates to 0x2000000, with the provenance of p (although the > > > > provenance is irrelevant in this execution). > > > > * The comparison is also defined and evaluates to true. > > > > * As a result, the program then calls __builtin_abort, which exits. > > > > > > > > A valid optimization must produce the same result (including side > > > > effects) > > > > given the same input for executions where all steps have defined > > > > semantics. > > > > > > > > > Except this comparison has an unspecified value, so an optimization is > > > allowed to change it. > > > > Is it? For, C this is defined. > > Ah, indeed this seems to be a difference between C and C++. C++: > > https://eel.is/c++draft/expr#eq-3.1 > "If one pointer represents the address of a complete object, and another > pointer represents the address one past the last element of a different > complete object, the result of the comparison is unspecified." > > C: > > "Two pointers compare equal if and only if ... one is a pointer to one > past the end of one array object and the other is a pointer to the start > of a different array object that happens to immediately follow the first > array object in the address space." >
We certainly also considered changes for C. But mostly the problem with "unspecified" is that it practice it is not a lot better than "undefined". Optimizers confuse themselves if they then have other optimizations which rely on the stability of the value. For example, see the comment by Alexander Cherepanov: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61502#c42 One could make comparisons for unrelated pointers for the case that one is a one-after pointer undefined in C (and C++), but this is difficult to sell. So for me it seems that GCC should be changed to make these comparisons deterministic based on the address as required by the C standard. Conditional equivalences need to have guards against the special case of one-after pointers. (for example, after a pointer is dereferenced once, one can be sure it is not a one-after pointer) Of course, GIMPLE could also have other semantics than C/C++, but then we need a way to express C/C++ semantics correctly. For example, if we had __builtin_expose / __builtin_synthesize that mark a pointer escaped or as of unknown provenance, respectively (maybe we have something like this?) then the front-ends could add those for casts from/to integers and for comparisons to implement the desired semantics. I assume the Rust FE will also need a solution. Martin
