Re: [PATCH] match.pd: Optimize (A ^ B) & C ^ B to (A & C)|(B & ~C) late in pipeline

Kishan Parmar Mon, 29 Dec 2025 02:41:59 -0800

Hello Andrew and Segher,

On 23/12/25 2:03 am, Segher Boessenkool wrote:
> Hi!
>
> On Tue, Dec 23, 2025 at 01:02:08AM +0530, Kishan Parmar wrote:
>> On 17/12/25 2:18 pm, Andrew Pinski wrote:
>>> Note you have the same issue with the following C example before this
>>> patch (just after you get with the original form too):
>>> ```
>>> unsigned foo0 (unsigned x, unsigned y)
>>> {
>>>   unsigned x1 = x >> 31;
>>>   unsigned yy = y & ~1;
>>>   return x1 | yy;
>>> }
>>> ```
>>>
>>> That was something I was trying to point out.
>>> And why having the check for INTEGER_CST is just exchanging one issue
>>> for another. It does not solve the new issue here.
>>>
>>> Wait, I think we are losing/not using the SUBREG_PROMOTED_UNSIGNED_P
>>> fact here which is causing the issues really.
>>> We lost it when we did the original 2->7 combine (for foo0 above):
>>> ```
>>> Trying 2 -> 7:
>>>     2: r121:DI=r128:DI
>>>       REG_DEAD r128:DI
>>>       REG_EQUIV [ap:DI+0x30]
>>>     7: r124:SI=r121:DI#4 0>>0x1f
>>>       REG_DEAD r121:DI
>>> Failed to match this instruction:
>>> (set (subreg:DI (reg:SI 124 [ x1_2 ]) 0)
>>>     (zero_extract:DI (reg:DI 128 [ x+-4 ])
>>>         (const_int 32 [0x20])
>>>         (const_int 1 [0x1])))
>>> Successfully matched this instruction:
>>> (set (subreg:DI (reg:SI 124 [ x1_2 ]) 0)
>>>     (and:DI (lshiftrt:DI (reg:DI 128 [ x+-4 ])
>>>             (const_int 31 [0x1f]))
>>>         (const_int 4294967295 [0xffffffff])))
>>> ```
>>>
>>> insn 7 was:
>>> ```
>>> (insn 7 4 9 2 (set (reg:SI 124 [ x1_2 ])
>>>         (lshiftrt:SI (subreg/s/v:SI (reg/v:DI 121 [ x+-4 ]) 4)
>>>             (const_int 31 [0x1f]))) "/app/example.cpp":30:12 292 {lshrsi3}
>>>      (expr_list:REG_DEAD (reg/v:DI 121 [ x+-4 ])
>>>         (nil)))
>>> ```
>>> insn 2 was just move.
>>> Note /s/v on the subreg there.
>>> If we didn't lose the SUBREG_PROMOTED_UNSIGNED_P, then things would just 
>>> work.
>>> So you need to figure out how to get
>>>
>>> So again the INTEGER_CST is just working around an issue down the line
>>> and not really a good idea; in the case of ppc, the losing of the
>>> SUBREG_PROMOTED_UNSIGNED_P.
>>>
>>> Thanks,
>>> Andrew Pinski
>> Hello Andrew and Segher,
>>
>> I went through combine pass traces, and the loss of the 
>> SUBREG_PROMOTED_UNSIGNED_P flag is during simplification.
> Documentation says this is printed as /u, which makes sense.  But that
> doesn't seem to happen?  It also says it is printed as /v, which *does*
> happen.  The documentation needs fixing.
>
> Either way, this stuff is meant to be an optimisation only (it is set
> for subregs where the extension in the larger reg is already known!)
>
>> The issue specifically happens in simplify_shift_const.
>> It attempts to commute the shift with the subreg to "widen" the operation to 
>> the native mode (DI),
>> but in doing so, it destroys the metadata.
>>
>> Input (SImode): (lshiftrt:SI (subreg/s/v:SI (reg:DI 128 [ x ]) 0) (const_int 
>> 31))
>> Output (DImode Widened): (subreg:SI (lshiftrt:DI (reg:DI 128 [ x ]) 
>> (const_int 31)) 0)
> lshiftrt:SI leaves the top 32 bits of the resulting thing (seen as
> DImode) completely undefined.  As pretty much all operations on SImode
> values do.
>
>> Let’s say, somehow we manage to keep SUBREG_PROMOTED_UNSIGNED_P after 
>> simplify_shift_const(modify to preserve /s/v flags),
>> and let us assume we have expression
>> (set (reg:SI 124 [ x1_2 ])
>>     (subreg:SI/v/s (lshiftrt:DI (reg:DI 128 [ x ])
>>             (const_int 31 [0x1f])) 0))
> That says that the SImode value of that shift is stored zero-extended in
> the DImode register.  Which is redundant information of course, that's
> just how RTL works.
>
>> But in later simplification, when gcc tries to simplify set.. It tries to 
>> create compund expression from set via make_compound_operation.
>> it typically SUBREG_REG(x) (the inner lshiftrt:DI) and passes that to helper 
>> functions.
>> Functions like gen_lowpart, force_to_mode, or gen_lowpart_or_truncate then 
>> re-wrap the result in a fresh subreg (truncation), losing the flags again.
>>
>> Furthermore, even if we preserve the flag, combine tends to generate a 
>> ZERO_EXTRACT pattern:
>> (set (subreg/s/v:DI (reg:SI 124) 0)
>>     (zero_extract:DI (reg:DI 128) (const_int 32) (const_int 1)))
> Only temporarily.  For targets that do not support zero_extract (like
> rs6000) all this is converted back to simpler, more useful, more
> serviceable, more canonical representation.
>
>> The above result we need to convert again back to
>> (set (reg:SI 124 [ x1_2 ])
>>     (subreg:SI/s/v (zero_extract:DI (reg:DI 128 [ x ])
>>         (const_int 32 [0x20])
>>         (const_int 1 [0x1])) 0))
>>
>> Finally, change_zero_ext would need to detect this specific pattern and 
>> simplify it back to the optimal shift form:
>> (set (reg:SI 124) (lshiftrt:SI (subreg/s/v:SI (reg:DI 128 [ x ]) 0)
>>     (const_int 31 [0x1f])))
>>
>> Preserving the SUBREG_PROMOTED_UNSIGNED_P property is difficult because it 
>> requires
>> updating the resulting SUBREG after every invocation of gen_lowpart, 
>> force_to_mode, or gen_lowpart_or_truncate.
> But it can be 100% reliably deduced from the lshifrt (by a constant
> non-zero amount).
>> So, My question here is should we go with preserve and restore the flag 
>> throughout the pipeline,
>> or should we just avoid widening subregs with SUBREG_PROMOTED_UNSIGNED_P?
Even if we tried to avoid widening here, we loose the flags because during 
simplification in combine pass,
because as mentioned above, gen_lowpart, force_to_mode, or 
gen_lowpart_or_truncate re-wrap the result in a fresh subreg.
>> You should not depend on optional and redundant flags being set *ever*?
>>
>>
After tracing combine pass further, I am convinced that Andrew's initial 
assessment was correct: the loss
of the |SUBREG_PROMOTED_UNSIGNED_P| flag prevents us from generating the 
optimal RTL.


Consider below pattern where combine generated zero_extract.

(set (reg:SI 123 [ _5 ])
    (ior:SI (and:SI (subreg:SI (reg:DI 129 [ y ]) 0)
            (const_int -2 [0xfffffffffffffffe]))
        (subreg:SI (zero_extract:DI (reg:DI 128 [ x ])
                (const_int 32 [0x20])
                (const_int 1 [0x1])) 0)))

GCC attempts to simplify the zero_extract into a shift/mask combination, 
resulting in failure:

Failed to match this instruction:
(set (reg:SI 123 [ _5 ])
    (ior:SI (and:SI (subreg:SI (reg:DI 129 [ y ]) 0)
            (const_int -2 [0xfffffffffffffffe]))
        (subreg:SI (and:DI (lshiftrt:DI (reg:DI 128 [ x ])
                    (const_int 31 [0x1f]))
                (const_int 4294967295 [0xffffffff])) 0)))

we need to push the subreg inside the lshiftrt to get: (lshiftrt:SI (subreg:SI 
(reg:DI 128) 0) 31)

However, we cannot safely perform this transformation effectively asserting 
that the upper 32 bits (including bit 63)
are irrelevant unless we know the operation was originally SImode where those 
bits were undefined.

Without the SUBREG_PROMOTED_UNSIGNED_P flag, GCC must conservatively assume the
upper bits of the DImode value matter. Therefore, it preserves the explicit AND 
mask (const_int 0xffffffff)
to clear them. This mask + mode difference between two operands of IOR is what 
blocks the backend matching.

Segher, I agree that we shouldn't over-rely on optional flags. However, this 
flag provides the most direct way to
track that the subreg corresponds to a promoted variable. If we ignore this 
metadata, we are forced to
generate defensive masking instructions to handle potential 'garbage' bits, 
which unfortunately obscures the pattern
and kills the optimization.

Therefore, we should preserve this flag through the simplification pipeline.

Thanks,
Kishan

Re: [PATCH] match.pd: Optimize (A ^ B) & C ^ B to (A & C)|(B & ~C) late in pipeline

Reply via email to