On Fri, Dec 12, 2025, 8:57 a.m. Krister Walfridsson via Gcc <[email protected]> wrote:
> On Thu, 11 Dec 2025, Richard Biener wrote: > > > Date: Thu, 11 Dec 2025 12:38:43 +0100 > > From: Richard Biener <[email protected]> > > To: Krister Walfridsson <[email protected]> > > Cc: [email protected] > > Subject: Re: pointer comparison in GIMPLE > > > > On Thu, Dec 11, 2025 at 5:12 AM Krister Walfridsson > > <[email protected]> wrote: > >> > >> On Wed, 10 Dec 2025, Richard Biener wrote: > >> > >>>> The problem is that in GIMPLE, a pointer does not need to be in > bounds. The caller could call the function with a value of i such that p + > i happens to be equal to &a. So, as I understand it, the GIMPLE semantics > do not allow the pass to conclude that p + i == &a is false, unless p + i > is dereferenced (because dereferencing a through p + i would be UB due to > provenance). > >>> > >>> GIMPLE adopts most of the C pointer restrictions here thus we can (and > do) conclude that pointers stay within an object when advanced. This is > used by the PTA pass which results are used when we optimize your example. > You have to divert to integer arithmetic to circumvent this and the PTA > pass, while tracking provenance through integers as well, does the right > thing with this. > >> > >> Great, that is much better for smtgcc than the semantics I have > currently > >> implemented! > >> > >> But it is not completely clear to me what "most of the C pointer > >> restrictions" implies. Is the following a correct interpretation? > >> > >> 1. A pointer must contain a value that points into (or one past) an > object > >> corresponding to its provenance (where a pointer may have multiple > >> provenances). Otherwise it invokes undefined behavior. > > > > Hmm. I think it's only UB when you'd "use" that pointer. That is, PTA > would > > compute the points-to set to 'nothing'. The immediate consequences are > such > > pointer isn't equal to any other pointer and accesses through it alias > > with nothing, > > stores would be DSEd. But at the point a SSA var is assigned such a > pointer > > we couldn't place a trap() (?) > > > >> 2. The provenance used for the result of POINTER_PLUS is the union of > the > >> provenances for the two arguments. > > > > For POINTER_PLUS it's the provenance of the first argument. > > > > For PLUS_EXPR it is the union of both arguments. For POINTER_DIFF_EXPR > > the result has no provenance. > > > >> 3. The POINTER_PLUS operation is UB if the calculation overflows and > >> TYPE_OVERFLOW_WRAPS(ptr_type) is false. > > > > Yes. > > > >> 4. The rules are the same for the calculations done in MEM_REF and > >> TARGET_MEM_REF as for POINTER_PLUS. > > > > Yes. > > > >> Question: For the TARGET_MEM_REF calculation: > >> BASE + STEP * INDEX + INDEX2 + OFFSET > >> Is it treated as one POINTER_PLUS, i.e. > >> BASE + (STEP * INDEX + INDEX2 + OFFSET) > >> or as two (i.e. do we care about overflow and OOB between the two index > >> calculations)? > > > > I'd say it counts as one pointer + offset calculation with all the offset > > calculation being done in wrapping operations. > > Your answers match exactly what is currently implemented in smtgcc, so I > am still thinking this is a bug in GCC (or that there is some missing > GIMPLE rule I must implement). > > The original program looks in GIMPLE like: > > void foo (char * p, long long int i) > { > char a; > sizetype i.0_1; > char * _2; > > <bb 2> : > i.0_1 = (sizetype) i_3(D); > _2 = p_4(D) + i.0_1; > if (_2 == &a) > goto <bb 3>; > else > goto <bb 4>; > > <bb 3> : > __builtin_abort (); > > <bb 4> : > a ={v} {CLOBBER(eos)}; > return; > } > > Assume, for the sake of argument, that the address of a is 0x2000000, p = > 0x1000000 and i = 0x1000000. > > With the semantics as described in this mail thread, all operations are > defined: > * _2 evaluates to 0x2000000, with the provenance of p (although the > provenance is irrelevant in this execution). > * The comparison is also defined and evaluates to true. > * As a result, the program then calls __builtin_abort, which exits. > > A valid optimization must produce the same result (including side effects) > given the same input for executions where all steps have defined > semantics. Except this comparison has an unspecified value, so an optimization is allowed to change it. Therefore, an optimization that does not call __builtin_abort > for this input is buggy (or the semantics is incorrect). > > You said in your second mail: "GIMPLE adopts most of the C pointer > restrictions here thus we can (and do) conclude that pointers stay within > an object when advanced. This is used by the PTA pass which results are > used when we optimize your example." which is what I tried to reflect in: > > >> 1. A pointer must contain a value that points into (or one past) an > object > >> corresponding to its provenance (where a pointer may have multiple > >> provenances). Otherwise it invokes undefined behavior. > > But as you say, that is wrong (and the vectorizer and ifconv do indeed > perform transformations that would be invalid with this semantics). So > what is the correct rule here for "pointers stay within an object"? All > ideas I have tried fails in different ways... :( > > --- > > I also have a somewhat related question regarding: > > >> 2. The provenance used for the result of POINTER_PLUS is the union of > the > >> provenances for the two arguments. > > > > For POINTER_PLUS it's the provenance of the first argument. > > >> 4. The rules are the same for the calculations done in MEM_REF and > >> TARGET_MEM_REF as for POINTER_PLUS. > > > > Yes. > > The ifconv pass sometimes rewrites memory accesses as: > > _84 = &MEM[(float *)0B + _83 + ivtmp.41_75 * 4]; > MEM[(float *)_84] = _3; > > which you can see by compiling testsuite/gcc.dg/sms-11.c for x86_64 with > -O1. > > If TARGET_MEM_REF works like POINTER_PLUS, which does not propagate > provenance through integers, then this store has no provenance and invokes > undefined behavior. So is ifopts buggy, or do TARGET_MEM_REF propagate > provenance from index/offset? > > /Krister > >
