On 8/8/25 3:31 AM, Richard Sandiford wrote:
In g:965564eafb721f8000013a3112f1bba8d8fae32b I'd added code
to try distributing non-widening subregs through logic ops,
in cases where that would eliminate a term of the logic op.
For "reasons", this indirectly caused combine to generate:
(set (zero_extract:SI (reg/v:SI 101 [ a ])
(const_int 8 [0x8])
(const_int 8 [0x8]))
(not:SI (sign_extract:SI (reg:SI 107 [ b ])
(const_int 8 [0x8])
(const_int 8 [0x8]))))
instead of:
(set (zero_extract:SI (reg/v:SI 101 [ a ])
(const_int 8 [0x8])
(const_int 8 [0x8]))
(subreg:SI (not:QI (subreg:QI (sign_extract:SI (reg:SI 107 [ b ])
(const_int 8 [0x8])
(const_int 8 [0x8])) 0)) 0))
for some tests that were intended to match x86's *one_cmplqi_ext<mode>_1
(see g:a58d770fa1d17ead3c38417b299cce3f19f392db). However, other more
direct ways of generating the pattern continued to have the unsimplified
(subreg:SI (not:QI (subreg:QI (...:SI ...)))) structure, since that
structure wasn't the focus of the original patch.
This patch tries to tackle that simplification head-on. It's another
case of distributing subregs, but this time for non-narrowing rather
than non-widening subregs. We already do the same distribution for
word_mode:
/* Attempt to simplify WORD_MODE SUBREGs of bitwise expressions. */
if (outermode == word_mode
&& (GET_CODE (op) == IOR || GET_CODE (op) == XOR || GET_CODE (op) == AND)
&& SCALAR_INT_MODE_P (innermode))
{
rtx op0 = simplify_subreg (outermode, XEXP (op, 0), innermode, byte);
rtx op1 = simplify_subreg (outermode, XEXP (op, 1), innermode, byte);
if (op0 && op1)
return simplify_gen_binary (GET_CODE (op), outermode, op0, op1);
}
which g:0340177d54d08b6375391ba164a878e6a596275e extended to NOT.
For word_mode, there are (reasonably) no restrictions on the inner
mode other than that it is an integer. Doing word_mode logic ops
should be at least as efficient as subword logic ops (if the target
provides subword ops at all). And word_mode logic ops should be
cheaper than multi-word logic ops.
Well, there are targets where sub-word ops can be more efficient, though
I suspect they're very much in the minority and perhaps all dead
architectures at this point. The one that immediately comes to mind in
the H8. The smaller modes tend to have more compact encoding and
potentially take fewer cycles as well. Though I wouldn't let these
oddballs derail where you're trying to go.
But here we need the distribution for SImode rather than word_mode
(DImode). The patch therefore extends the word_mode distributions
to non-narrowing subregs in which the two modes occupy the same
number of words. This should hopefully be relatively conservative.
It prevents the new rule from going away from word_mode, and attempting
to convert (say) a QImode subreg of a word_mode AND into a QImode AND.
It should be suitable for both CISCy and RISCy targets, including
those that define WORD_REGISTER_OPERATIONS.
The patch also fixes some overlong lines in related code.
Tested on aarch64-linux-gnu and x86_64-linux-gnu. OK to install?
Richard
gcc/
PR rtl-optimization/121306
* simplify-rtx.cc (simplify_context::simplify_subreg): Distribute
non-narrowing integer-to-integer subregs through logic ops,
in a similar way to the existing word_mode handling.
---
OK
jeff