https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121778

Jeffrey A. Law <law at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
     Ever confirmed|0                           |1
   Last reconfirmed|                            |2025-11-02
                 CC|                            |smunnangi1 at ventanamicro dot 
com
             Status|UNCONFIRMED                 |NEW

--- Comment #7 from Jeffrey A. Law <law at gcc dot gnu.org> ---
So to fix this more generally we're almost certainly going to need a match.pd
pattern.

They key is to realize that the XOR can reassociate in this fairly narrow case.

If we look at the expression (a is an unsigned int just for clarity):

(a << 1) | ((a >> 31) ^ 1)

The bit flip from the XOR is constrained in that it does not affect bits from
the (a << 1) part of the expression.  In that limited case we can reassociate
the XOR.

So given this gimple:


  _1 = a_4(D) << 1;
  _2 = a_4(D) >> 31;
  _3 = _2 ^ 1;
  _5 = _1 | _3;

We can reassociate into:

  _1 = a_4(D) << 1;
  _2 = a_4(D) >> 31;
  _3 = _1 | _2
  _5 = _3 ^ 1;

And in that form we'll recognize the 32bit rotate and ultimately generate
better code on targets with that capability.

The question is do we just reorder (canonicalize) and let other code recognize
the rotate, or do we go straight to generating a rotate in match.pd?

I'd tend to lean towards the latter as that approach would have less overall
disturbance in code generation and thus less opportunity for something to go
wrong and cause a performance regression.  Of course that also means we
wouldn't see any secondary benefits from reordering more aggressively.

Reply via email to