On Fri, Dec 12, 2025 at 1:43 PM Martin Uecker <[email protected]> wrote:
>
> Am Freitag, dem 12.12.2025 um 18:50 +0700 schrieb Jason Merrill:
> > On 12/12/25 2:10 PM, Martin Uecker wrote:
> > > Am Freitag, dem 12.12.2025 um 09:21 +0700 schrieb Jason Merrill via Gcc:
> > > > On Fri, Dec 12, 2025, 8:57 a.m. Krister Walfridsson via Gcc 
> > > > <[email protected]>
> > > > wrote:
> > > >
> > > > > On Thu, 11 Dec 2025, Richard Biener wrote:
> > > > >
> > > > > > Date: Thu, 11 Dec 2025 12:38:43 +0100
> > > > > > From: Richard Biener <[email protected]>
> > > > > > To: Krister Walfridsson <[email protected]>
> > > > > > Cc: [email protected]
> > > > > > Subject: Re: pointer comparison in GIMPLE
> > > > > >
> > > > > > On Thu, Dec 11, 2025 at 5:12 AM Krister Walfridsson
> > > > > > <[email protected]> wrote:
> > > > > > >
> > > > > > > On Wed, 10 Dec 2025, Richard Biener wrote:
> > > > > > >
> > > > > > > > > The problem is that in GIMPLE, a pointer does not need to be 
> > > > > > > > > in
> > > > > bounds. The caller could call the function with a value of i such 
> > > > > that p +
> > > > > i happens to be equal to &a. So, as I understand it, the GIMPLE 
> > > > > semantics
> > > > > do not allow the pass to conclude that p + i == &a is false, unless p 
> > > > > + i
> > > > > is dereferenced (because dereferencing a through p + i would be UB 
> > > > > due to
> > > > > provenance).
> > > > > > > >
> > > > > > > > GIMPLE adopts most of the C pointer restrictions here thus we 
> > > > > > > > can (and
> > > > > do) conclude that pointers stay within an object when advanced.  This 
> > > > > is
> > > > > used by the PTA pass which results are used when we optimize your 
> > > > > example.
> > > > > You have to divert to integer arithmetic to circumvent this and the 
> > > > > PTA
> > > > > pass, while tracking provenance through integers as well, does the 
> > > > > right
> > > > > thing with this.
> > > > > > >
> > > > > > > Great, that is much better for smtgcc than the semantics I have
> > > > > currently
> > > > > > > implemented!
> > > > > > >
> > > > > > > But it is not completely clear to me what "most of the C pointer
> > > > > > > restrictions" implies. Is the following a correct interpretation?
> > > > > > >
> > > > > > > 1. A pointer must contain a value that points into (or one past) 
> > > > > > > an
> > > > > object
> > > > > > > corresponding to its provenance (where a pointer may have multiple
> > > > > > > provenances). Otherwise it invokes undefined behavior.
> > > > > >
> > > > > > Hmm.  I think it's only UB when you'd "use" that pointer.  That is, 
> > > > > > PTA
> > > > > would
> > > > > > compute the points-to set to 'nothing'.  The immediate consequences 
> > > > > > are
> > > > > such
> > > > > > pointer isn't equal to any other pointer and accesses through it 
> > > > > > alias
> > > > > > with nothing,
> > > > > > stores would be DSEd.  But at the point a SSA var is assigned such a
> > > > > pointer
> > > > > > we couldn't place a trap() (?)
> > > > > >
> > > > > > > 2. The provenance used for the result of POINTER_PLUS is the 
> > > > > > > union of
> > > > > the
> > > > > > > provenances for the two arguments.
> > > > > >
> > > > > > For POINTER_PLUS it's the provenance of the first argument.
> > > > > >
> > > > > > For PLUS_EXPR it is the union of both arguments.  For 
> > > > > > POINTER_DIFF_EXPR
> > > > > > the result has no provenance.
> > > > > >
> > > > > > > 3. The POINTER_PLUS operation is UB if the calculation overflows 
> > > > > > > and
> > > > > > > TYPE_OVERFLOW_WRAPS(ptr_type) is false.
> > > > > >
> > > > > > Yes.
> > > > > >
> > > > > > > 4. The rules are the same for the calculations done in MEM_REF and
> > > > > > > TARGET_MEM_REF as for POINTER_PLUS.
> > > > > >
> > > > > > Yes.
> > > > > >
> > > > > > > Question: For the TARGET_MEM_REF calculation:
> > > > > > >     BASE + STEP * INDEX + INDEX2 + OFFSET
> > > > > > > Is it treated as one POINTER_PLUS, i.e.
> > > > > > >     BASE + (STEP * INDEX + INDEX2 + OFFSET)
> > > > > > > or as two (i.e. do we care about overflow and OOB between the two 
> > > > > > > index
> > > > > > > calculations)?
> > > > > >
> > > > > > I'd say it counts as one pointer + offset calculation with all the 
> > > > > > offset
> > > > > > calculation being done in wrapping operations.
> > > > >
> > > > > Your answers match exactly what is currently implemented in smtgcc, 
> > > > > so I
> > > > > am still thinking this is a bug in GCC (or that there is some missing
> > > > > GIMPLE rule I must implement).
> > > > >
> > > > > The original program looks in GIMPLE like:
> > > > >
> > > > >     void foo (char * p, long long int i)
> > > > >     {
> > > > >       char a;
> > > > >       sizetype i.0_1;
> > > > >       char * _2;
> > > > >
> > > > >       <bb 2> :
> > > > >       i.0_1 = (sizetype) i_3(D);
> > > > >       _2 = p_4(D) + i.0_1;
> > > > >       if (_2 == &a)
> > > > >         goto <bb 3>;
> > > > >       else
> > > > >         goto <bb 4>;
> > > > >
> > > > >       <bb 3> :
> > > > >       __builtin_abort ();
> > > > >
> > > > >       <bb 4> :
> > > > >       a ={v} {CLOBBER(eos)};
> > > > >       return;
> > > > >     }
> > > > >
> > > > > Assume, for the sake of argument, that the address of a is 0x2000000, 
> > > > > p =
> > > > > 0x1000000 and i = 0x1000000.
> > > > >
> > > > > With the semantics as described in this mail thread, all operations 
> > > > > are
> > > > > defined:
> > > > >    * _2 evaluates to 0x2000000, with the provenance of p (although the
> > > > >      provenance is irrelevant in this execution).
> > > > >    * The comparison is also defined and evaluates to true.
> > > > >    * As a result, the program then calls __builtin_abort, which exits.
> > > > >
> > > > > A valid optimization must produce the same result (including side 
> > > > > effects)
> > > > > given the same input for executions where all steps have defined
> > > > > semantics.
> > > >
> > > >
> > > > Except this comparison has an unspecified value, so an optimization is
> > > > allowed to change it.
> > >
> > > Is it?  For, C this is defined.
> >
> > Ah, indeed this seems to be a difference between C and C++.  C++:
> >
> > https://eel.is/c++draft/expr#eq-3.1
> > "If one pointer represents the address of a complete object, and another
> > pointer represents the address one past the last element of a different
> > complete object, the result of the comparison is unspecified."
> >
> > C:
> >
> > "Two pointers compare equal if and only if ... one is a pointer to one
> > past the end of one array object and the other is a pointer to the start
> > of a different array object that happens to immediately follow the first
> > array object in the address space."
> >

We optimize this with the C++ rules because points-to info does not
track whether we can prove pointers will stay within the object or
eventually point to one-after.

> We certainly also considered changes for C.  But mostly
> the problem with "unspecified" is that it practice it
> is not a lot better than "undefined".  Optimizers
> confuse themselves if they then have other
> optimizations which rely on the stability of the value.
> For example, see the comment by Alexander Cherepanov:
>
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61502#c42
>
>
> One could make comparisons for unrelated pointers for the
> case that one is a one-after pointer undefined in C (and C++),
> but this is difficult to sell.
>
> So for me it seems that GCC should be changed to make
> these comparisons deterministic based on the address as
> required by the C standard.  Conditional equivalences need
> to have guards against the special case of one-after pointers.
> (for example, after a pointer is dereferenced once, one
> can be sure it is not a one-after pointer)
>
> Of course, GIMPLE could also have other semantics than
> C/C++, but then we need a way to express C/C++ semantics
> correctly.  For example, if we had
> __builtin_expose / __builtin_synthesize that mark
> a pointer escaped or as of unknown provenance,
> respectively (maybe we have something like this?)
> then the front-ends could add those for casts from/to
> integers and for comparisons to implement the desired
> semantics.  I assume the Rust FE will also need a solution.
>
> Martin

Reply via email to