Re: [COMMITTED] PR tree-optimization/119712 - Always reflect lower bits from mask in subranges.

Andrew MacLeod Wed, 30 Apr 2025 10:11:29 -0700

On 4/30/25 02:56, Richard Biener wrote:

On Wed, Apr 30, 2025 at 12:00 AM Andrew MacLeod <amacl...@redhat.com> wrote:


On 4/28/25 17:26, Andrew MacLeod wrote:

I have committed this patch to trunk after bootstrap/regression
testing again on trunk.

I'll get to gcc14/15 once I flush the current queue.

Andrew


On 4/17/25 06:44, Richard Biener wrote:

On Wed, Apr 16, 2025 at 10:55 PM Andrew MacLeod <amacl...@redhat.com>
wrote:

This was a fun one!   An actual bug, and it took a while to sort out.
After chasing down some red herrings, this turns out to be an issue of
interaction between the range and value masks and intervening
calculations.

The original patch from 11/2023 adjusts intersection so that it can
enhance subranges based on the value mask.  ie in this testcase

[irange] int [-INF, 2147483644] MASK 0xfffffffc VALUE 0x1

    If adjust_range() were called on this, it would eliminate the
trailing
mask/value bit ranges that are invalid and turn it into :

[-INF, -3][1, 1][4, 2147483626] MASK 0xfffffffc VALUE 0x1

reflecting the lower bits into the range.   The problem develops
because
we only apply adjust_range ()  during intersection in an attempt to
avoid expensive work when it isnt needed.

Unfortunately, that is what triggers this infinite loop. Rangers cache
propagates changes, and the algorithm is designed to always improve the
range.  In this case, the first iteration through, _11 receives the
above value, [irange] int [-INF, 2147483644] MASK 0xfffffffc VALUE 0x1
which via the mask, excludes 0, 2 and 3.

The ensuing calculations in block 7 do not trigger a successful
intersection operation, and thus the range pairs are never expanded to
eliminate the lower ranges, and it triggers another change in values
which leads to the next iteration being less precise, but not obviously
so. [irange] int [-INF, 2147483644] MASK 0xfffffffd VALUE 0x0 is a
result of the calculation.   As ranges as suppose to always get better
with this algorithm, we simply compare for difference.. and this range
is different, and thus we replace it. It only excludes 2 and 3.

Next iteration through the less precise range DOES trigger an
intersection operation in block 7, and when that is expanded to
[irange]
int [-INF, 1][4, 2147483644] MASK 0xfffffffd VALUE 0x0 using that we
can
again create the more precise range for _11 that started the cycle. and
we go on and on and on.

If we fix this so that we always expand subranges to reflect the lower
bits in a bitmask, the initial value starts with

[irange] int [-INF, -3][1, 1][4, 2147483644] MASK 0xfffffffc VALUE 0x1

And everything falls into place as it should.  The fix is to be
consistent about expanding those lower subranges.

I also added a couple of minor performance tweaks to avoid unnecessary
work, along with removing adjust_range () directly into
set_range_from_bitmask () .

I started at a 0.2% overall compilation increase (1.8% in VRP). In the
end, this patch is down to 0.6% in VRP, and only 0.08% overall, so
manageable for all the extra work.

It also causes a few ripples in the testsuite so 3 test cases also
needed adjustment:

    * gcc.dg/pr83072-2.c :  With the addition of the expanded ranges,
CCP
use to export a global:
       Global Exported: c_3 = [irange] int [-INF, +INF] MASK 0xfffffffe
VALUE 0x1
and now
      Global Exported: c_3 = [irange] int [-INF, -1][1, +INF] MASK
0xfffffffe VALUE 0x1
Which in turn enables forwprop to collapse part of the testcase much
earlier.     So I turned off forwprop for the testcase

* gcc.dg/tree-ssa/phi-opt-value-5.c  : WIth the expanded ranges, CCP2
pass use to export:
      Global Exported: d_3 = [irange] int [-INF, +INF] MASK 0xfffffffe
VALUE 0x1
and now
      Global Exported: d_3 = [irange] int [-INF, -1][1, +INF] MASK
0xfffffffe VALUE 0x1
which in turn makes the following comment obsolete as the optimization
does happen earlier.:
/* fdiv1 requires until later than phiopt2 to be able to detect that
      d is non-zero. to be able to remove the conditional.  */
Adjusted the testcase to expect everything to be taken care of by
phi-opt2 pass.

    * gcc.dg/tree-ssa/vrp122.c : Previously, CCP exported:
      Global Exported: g_4 = [irange] unsigned int [0, +INF] MASK
0xfffffff0 VALUE 0x0
and then EVRP refined that and stored it, then the testcase tested for:
      Global Exported: g_4 = [irange] unsigned int [0, 0][16, +INF] MASK
0xfffffff0 VALUE 0x0
Now, CCP itself exported the expanded range, so there is nothing for
VRP
to do.
adjusted the testcase to look for the expanded range in CCP.

Now we never get into this situation where the bitmask is explicitly
applied in some places and not others.

Bootstraps on x86_64-pc-linux-gnu with no regressions. Finally.   Is
this OK for trunk, or should I hold off a little bit?

Please wait a little bit until after 15.1 is out.  It's then OK for
trunk in
stage1 and backports when no issues show up.

Thanks,
Richard.

Andrew

This is now in trunk.   Attached are the patches for gcc15 and gcc14.

Bootstrapped with no regressions on x86_64-pc-linux-gnu.

Do you want me to check it in for either or both branches?

OK for 15.

Richard.


Pushed.

Andrew

Re: [COMMITTED] PR tree-optimization/119712 - Always reflect lower bits from mask in subranges.

Reply via email to