Am Freitag, dem 12.12.2025 um 18:50 +0700 schrieb Jason Merrill:
> On 12/12/25 2:10 PM, Martin Uecker wrote:
> > Am Freitag, dem 12.12.2025 um 09:21 +0700 schrieb Jason Merrill via Gcc:
> > > On Fri, Dec 12, 2025, 8:57 a.m. Krister Walfridsson via Gcc 
> > > <[email protected]>
> > > wrote:
> > > 
> > > > On Thu, 11 Dec 2025, Richard Biener wrote:
> > > > 
> > > > > Date: Thu, 11 Dec 2025 12:38:43 +0100
> > > > > From: Richard Biener <[email protected]>
> > > > > To: Krister Walfridsson <[email protected]>
> > > > > Cc: [email protected]
> > > > > Subject: Re: pointer comparison in GIMPLE
> > > > > 
> > > > > On Thu, Dec 11, 2025 at 5:12 AM Krister Walfridsson
> > > > > <[email protected]> wrote:
> > > > > > 
> > > > > > On Wed, 10 Dec 2025, Richard Biener wrote:
> > > > > > 
> > > > > > > > The problem is that in GIMPLE, a pointer does not need to be in
> > > > bounds. The caller could call the function with a value of i such that 
> > > > p +
> > > > i happens to be equal to &a. So, as I understand it, the GIMPLE 
> > > > semantics
> > > > do not allow the pass to conclude that p + i == &a is false, unless p + 
> > > > i
> > > > is dereferenced (because dereferencing a through p + i would be UB due 
> > > > to
> > > > provenance).
> > > > > > > 
> > > > > > > GIMPLE adopts most of the C pointer restrictions here thus we can 
> > > > > > > (and
> > > > do) conclude that pointers stay within an object when advanced.  This is
> > > > used by the PTA pass which results are used when we optimize your 
> > > > example.
> > > > You have to divert to integer arithmetic to circumvent this and the PTA
> > > > pass, while tracking provenance through integers as well, does the right
> > > > thing with this.
> > > > > > 
> > > > > > Great, that is much better for smtgcc than the semantics I have
> > > > currently
> > > > > > implemented!
> > > > > > 
> > > > > > But it is not completely clear to me what "most of the C pointer
> > > > > > restrictions" implies. Is the following a correct interpretation?
> > > > > > 
> > > > > > 1. A pointer must contain a value that points into (or one past) an
> > > > object
> > > > > > corresponding to its provenance (where a pointer may have multiple
> > > > > > provenances). Otherwise it invokes undefined behavior.
> > > > > 
> > > > > Hmm.  I think it's only UB when you'd "use" that pointer.  That is, 
> > > > > PTA
> > > > would
> > > > > compute the points-to set to 'nothing'.  The immediate consequences 
> > > > > are
> > > > such
> > > > > pointer isn't equal to any other pointer and accesses through it alias
> > > > > with nothing,
> > > > > stores would be DSEd.  But at the point a SSA var is assigned such a
> > > > pointer
> > > > > we couldn't place a trap() (?)
> > > > > 
> > > > > > 2. The provenance used for the result of POINTER_PLUS is the union 
> > > > > > of
> > > > the
> > > > > > provenances for the two arguments.
> > > > > 
> > > > > For POINTER_PLUS it's the provenance of the first argument.
> > > > > 
> > > > > For PLUS_EXPR it is the union of both arguments.  For 
> > > > > POINTER_DIFF_EXPR
> > > > > the result has no provenance.
> > > > > 
> > > > > > 3. The POINTER_PLUS operation is UB if the calculation overflows and
> > > > > > TYPE_OVERFLOW_WRAPS(ptr_type) is false.
> > > > > 
> > > > > Yes.
> > > > > 
> > > > > > 4. The rules are the same for the calculations done in MEM_REF and
> > > > > > TARGET_MEM_REF as for POINTER_PLUS.
> > > > > 
> > > > > Yes.
> > > > > 
> > > > > > Question: For the TARGET_MEM_REF calculation:
> > > > > >     BASE + STEP * INDEX + INDEX2 + OFFSET
> > > > > > Is it treated as one POINTER_PLUS, i.e.
> > > > > >     BASE + (STEP * INDEX + INDEX2 + OFFSET)
> > > > > > or as two (i.e. do we care about overflow and OOB between the two 
> > > > > > index
> > > > > > calculations)?
> > > > > 
> > > > > I'd say it counts as one pointer + offset calculation with all the 
> > > > > offset
> > > > > calculation being done in wrapping operations.
> > > > 
> > > > Your answers match exactly what is currently implemented in smtgcc, so I
> > > > am still thinking this is a bug in GCC (or that there is some missing
> > > > GIMPLE rule I must implement).
> > > > 
> > > > The original program looks in GIMPLE like:
> > > > 
> > > >     void foo (char * p, long long int i)
> > > >     {
> > > >       char a;
> > > >       sizetype i.0_1;
> > > >       char * _2;
> > > > 
> > > >       <bb 2> :
> > > >       i.0_1 = (sizetype) i_3(D);
> > > >       _2 = p_4(D) + i.0_1;
> > > >       if (_2 == &a)
> > > >         goto <bb 3>;
> > > >       else
> > > >         goto <bb 4>;
> > > > 
> > > >       <bb 3> :
> > > >       __builtin_abort ();
> > > > 
> > > >       <bb 4> :
> > > >       a ={v} {CLOBBER(eos)};
> > > >       return;
> > > >     }
> > > > 
> > > > Assume, for the sake of argument, that the address of a is 0x2000000, p 
> > > > =
> > > > 0x1000000 and i = 0x1000000.
> > > > 
> > > > With the semantics as described in this mail thread, all operations are
> > > > defined:
> > > >    * _2 evaluates to 0x2000000, with the provenance of p (although the
> > > >      provenance is irrelevant in this execution).
> > > >    * The comparison is also defined and evaluates to true.
> > > >    * As a result, the program then calls __builtin_abort, which exits.
> > > > 
> > > > A valid optimization must produce the same result (including side 
> > > > effects)
> > > > given the same input for executions where all steps have defined
> > > > semantics.
> > > 
> > > 
> > > Except this comparison has an unspecified value, so an optimization is
> > > allowed to change it.
> > 
> > Is it?  For, C this is defined.
> 
> Ah, indeed this seems to be a difference between C and C++.  C++:
> 
> https://eel.is/c++draft/expr#eq-3.1
> "If one pointer represents the address of a complete object, and another 
> pointer represents the address one past the last element of a different 
> complete object, the result of the comparison is unspecified."
> 
> C:
> 
> "Two pointers compare equal if and only if ... one is a pointer to one 
> past the end of one array object and the other is a pointer to the start 
> of a different array object that happens to immediately follow the first 
> array object in the address space."
> 

We certainly also considered changes for C.  But mostly
the problem with "unspecified" is that it practice it
is not a lot better than "undefined".  Optimizers
confuse themselves if they then have other
optimizations which rely on the stability of the value.
For example, see the comment by Alexander Cherepanov:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61502#c42


One could make comparisons for unrelated pointers for the
case that one is a one-after pointer undefined in C (and C++),
but this is difficult to sell.

So for me it seems that GCC should be changed to make
these comparisons deterministic based on the address as
required by the C standard.  Conditional equivalences need
to have guards against the special case of one-after pointers.
(for example, after a pointer is dereferenced once, one
can be sure it is not a one-after pointer)

Of course, GIMPLE could also have other semantics than
C/C++, but then we need a way to express C/C++ semantics
correctly.  For example, if we had 
__builtin_expose / __builtin_synthesize that mark
a pointer escaped or as of unknown provenance, 
respectively (maybe we have something like this?)
then the front-ends could add those for casts from/to
integers and for comparisons to implement the desired
semantics.  I assume the Rust FE will also need a solution. 

Martin

Reply via email to