On Thu, 11 Dec 2025, Richard Biener wrote:

Date: Thu, 11 Dec 2025 12:38:43 +0100
From: Richard Biener <[email protected]>
To: Krister Walfridsson <[email protected]>
Cc: [email protected]
Subject: Re: pointer comparison in GIMPLE

On Thu, Dec 11, 2025 at 5:12 AM Krister Walfridsson
<[email protected]> wrote:

On Wed, 10 Dec 2025, Richard Biener wrote:

The problem is that in GIMPLE, a pointer does not need to be in bounds. The caller 
could call the function with a value of i such that p + i happens to be equal to 
&a. So, as I understand it, the GIMPLE semantics do not allow the pass to conclude 
that p + i == &a is false, unless p + i is dereferenced (because dereferencing a 
through p + i would be UB due to provenance).

GIMPLE adopts most of the C pointer restrictions here thus we can (and do) 
conclude that pointers stay within an object when advanced.  This is used by 
the PTA pass which results are used when we optimize your example.  You have to 
divert to integer arithmetic to circumvent this and the PTA pass, while 
tracking provenance through integers as well, does the right thing with this.

Great, that is much better for smtgcc than the semantics I have currently
implemented!

But it is not completely clear to me what "most of the C pointer
restrictions" implies. Is the following a correct interpretation?

1. A pointer must contain a value that points into (or one past) an object
corresponding to its provenance (where a pointer may have multiple
provenances). Otherwise it invokes undefined behavior.

Hmm.  I think it's only UB when you'd "use" that pointer.  That is, PTA would
compute the points-to set to 'nothing'.  The immediate consequences are such
pointer isn't equal to any other pointer and accesses through it alias
with nothing,
stores would be DSEd.  But at the point a SSA var is assigned such a pointer
we couldn't place a trap() (?)

2. The provenance used for the result of POINTER_PLUS is the union of the
provenances for the two arguments.

For POINTER_PLUS it's the provenance of the first argument.

For PLUS_EXPR it is the union of both arguments.  For POINTER_DIFF_EXPR
the result has no provenance.

3. The POINTER_PLUS operation is UB if the calculation overflows and
TYPE_OVERFLOW_WRAPS(ptr_type) is false.

Yes.

4. The rules are the same for the calculations done in MEM_REF and
TARGET_MEM_REF as for POINTER_PLUS.

Yes.

Question: For the TARGET_MEM_REF calculation:
   BASE + STEP * INDEX + INDEX2 + OFFSET
Is it treated as one POINTER_PLUS, i.e.
   BASE + (STEP * INDEX + INDEX2 + OFFSET)
or as two (i.e. do we care about overflow and OOB between the two index
calculations)?

I'd say it counts as one pointer + offset calculation with all the offset
calculation being done in wrapping operations.

Your answers match exactly what is currently implemented in smtgcc, so I am still thinking this is a bug in GCC (or that there is some missing GIMPLE rule I must implement).

The original program looks in GIMPLE like:

  void foo (char * p, long long int i)
  {
    char a;
    sizetype i.0_1;
    char * _2;

    <bb 2> :
    i.0_1 = (sizetype) i_3(D);
    _2 = p_4(D) + i.0_1;
    if (_2 == &a)
      goto <bb 3>;
    else
      goto <bb 4>;

    <bb 3> :
    __builtin_abort ();

    <bb 4> :
    a ={v} {CLOBBER(eos)};
    return;
  }

Assume, for the sake of argument, that the address of a is 0x2000000, p = 0x1000000 and i = 0x1000000.

With the semantics as described in this mail thread, all operations are defined:
 * _2 evaluates to 0x2000000, with the provenance of p (although the
   provenance is irrelevant in this execution).
 * The comparison is also defined and evaluates to true.
 * As a result, the program then calls __builtin_abort, which exits.

A valid optimization must produce the same result (including side effects) given the same input for executions where all steps have defined semantics. Therefore, an optimization that does not call __builtin_abort for this input is buggy (or the semantics is incorrect).

You said in your second mail: "GIMPLE adopts most of the C pointer restrictions here thus we can (and do) conclude that pointers stay within an object when advanced. This is used by the PTA pass which results are used when we optimize your example." which is what I tried to reflect in:

1. A pointer must contain a value that points into (or one past) an object
corresponding to its provenance (where a pointer may have multiple
provenances). Otherwise it invokes undefined behavior.

But as you say, that is wrong (and the vectorizer and ifconv do indeed perform transformations that would be invalid with this semantics). So what is the correct rule here for "pointers stay within an object"? All ideas I have tried fails in different ways... :(

---

I also have a somewhat related question regarding:

2. The provenance used for the result of POINTER_PLUS is the union of the
provenances for the two arguments.

For POINTER_PLUS it's the provenance of the first argument.

4. The rules are the same for the calculations done in MEM_REF and
TARGET_MEM_REF as for POINTER_PLUS.

Yes.

The ifconv pass sometimes rewrites memory accesses as:

  _84 = &MEM[(float *)0B + _83 + ivtmp.41_75 * 4];
  MEM[(float *)_84] = _3;

which you can see by compiling testsuite/gcc.dg/sms-11.c for x86_64 with -O1.

If TARGET_MEM_REF works like POINTER_PLUS, which does not propagate provenance through integers, then this store has no provenance and invokes undefined behavior. So is ifopts buggy, or do TARGET_MEM_REF propagate provenance from index/offset?

   /Krister

Reply via email to