On 11/2/24 2:58 AM, Alexey Merzlyakov wrote:
This patch adds optimization of the following patterns:
(zero_extend:M (subreg:N (not:O==M (X:Q==M)))) ->
(xor:M (zero_extend:M (subreg:N (X:M)), 0xffff))
... where mask takes 0xffff bits of N mode bitsize
For the cases when X:M doesn't have any non-zero bits outside of mode N,
(zero_extend:M (subreg:N (X:M)) could be simplified to just (X:M)
and whole optimization will be:
(zero_extend:M (subreg:N (not:M (X:M)))) ->
(xor:M (X:M, 0xffff))
Patch targets to handle code patterns like:
not a0,a0
andi a0,a0,0xff
to be optimized to:
xori a0,a0,255
Just to be clear we don't need to worry about paradoxical subregs in
this case. To have a paradoxical N would have to be larger than M which
would be invalid for the zero_extend expression.
But as your implementation shows, we do need to be somewhat careful
about bits in X outside of mode M that may be set. Concretely let's
take a HImode inner object with a QImode narrowing subreg
(zero_extend:HI (subreg:QI (not:HI (X:HI)))) where X has the value 0x8000.
That expression says unambiguously that the result is 0xff, even though
the subreg indicates bits outside QI are don't cares, the outer
zero_extend sets them to 0. If we naively changed this to the xor X,
0xff form we'd generate 0x80ff, which would be incorrect.
Change was locally tested for x86_64 and AArch64 (as most common)
and for RV-64 and MIPS-32 targets (as having an effect from this
optimization):
no regressions for all cases.
gcc/ChangeLog:
* simplify-rtx.cc (simplify_context::simplify_unary_operation_1):
Simplify ZERO_EXTEND (SUBREG (NOT X)) to XOR (X, 0xff...f) when X
doesn't have any non-zero bits outside of SUBREG mode.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/pr112398.c: New test.
Signed-off-by: Alexey Merzlyakov <alexey.merzlya...@samsung.com>
---
gcc/simplify-rtx.cc | 23 +++++++++++++++++++++++
gcc/testsuite/gcc.target/riscv/pr112398.c | 14 ++++++++++++++
2 files changed, 37 insertions(+)
create mode 100644 gcc/testsuite/gcc.target/riscv/pr112398.c
diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc
index 2c04ce960ee..608ecedbcb8 100644
--- a/gcc/simplify-rtx.cc
+++ b/gcc/simplify-rtx.cc
@@ -1842,6 +1842,29 @@ simplify_context::simplify_unary_operation_1
(rtx_code code, machine_mode mode,
& ~GET_MODE_MASK (op_mode)) == 0)
return SUBREG_REG (op);
+ /* Trying to optimize:
+ (zero_extend:M (subreg:N (not:M (X:M)))) ->
+ (xor:M (zero_extend:M (subreg:N (X:M)), 0xffff))
+ where mask takes 0xffff bits of N mode bitsize.
"where the mask is GET_MODE_MASK (N)" is probably clearer to folks that
have been working on GCC for a while. GET_MODE_MASK is the bits set in
mode N. So for QI -> 0xff, HI would be 0xffff and so-on.
+ For the cases when X:M doesn't have any non-zero bits
+ outside of mode N, (zero_extend:M (subreg:N (X:M))
+ will be simplified to just (X:M)
+ and whole optimization will be -> (xor:M (X:M), 0xffff). */
+ if (GET_CODE (op) == SUBREG
Write this as "SUBREG_P (op)"
+ && GET_CODE (XEXP (op, 0)) == NOT
+ && GET_MODE (XEXP (op, 0)) == mode
+ && GET_MODE (XEXP (XEXP (op, 0), 0)) == mode
I suspect the last mode check is redundant. We're not supposed to have
the argument of a code like NOT have a different mode than the code.
+ && subreg_lowpart_p (op)
Normally you'd probably need to check !paradoxical_subreg_p as well, but
that case can't happen here because the mode of the outer zero_extend
and thus the inner not must be wider than the mode of the subreg.
+ && (nonzero_bits (XEXP (XEXP (op, 0), 0), mode)
+ & ~GET_MODE_MASK (GET_MODE (XEXP (XEXP (op, 0), 0)))) == 0)
Isn't GET_MODE (XEXP (XEXP (op, 0), 0)) the same as "mode"? Use the
value in the local variable, it's easier to read IMHO.
I suspect the formatting is off as well. We format conditionals like this:
if (a
&& b
&& c)
Note how the "&&" lines up under the first argument to the IF.
+ {
+ const uint64_t mask
+ = ~((uint64_t)~0 << GET_MODE_BITSIZE (GET_MODE (op)).coeffs[0]);
Probably shouldn't be looking directly at .coeffs field. Instead there
are methods to convert that value to an integer. Look for the
"to_constant ()" method. so
GET_MODE_SIZE (GET_MODE (op)).to_constant ()
Note that this transformation may not work for modes with nonconstant
sizes. So you might need to check the is_constant () method.
+ return simplify_gen_binary (XOR, mode,
+ XEXP (XEXP (op, 0), 0),
+ gen_int_mode (mask, mode));
Formatting looks goofy here too.
Line up each argument under the argument in the prior line ie
foo (XOR, mode
XEXP (XEXP (op, 0), 0)
get_int_mode (mask, mode))
Overall it looks pretty good, but there are a few things you need to fix
up and repost.
Jeff