On 11/2/24 2:58 AM, Alexey Merzlyakov wrote:
This patch adds optimization of the following patterns:

    (zero_extend:M (subreg:N (not:O==M (X:Q==M)))) ->
    (xor:M (zero_extend:M (subreg:N (X:M)), 0xffff))
      ... where mask takes 0xffff bits of N mode bitsize

For the cases when X:M doesn't have any non-zero bits outside of mode N,
(zero_extend:M (subreg:N (X:M)) could be simplified to just (X:M)
and whole optimization will be:

    (zero_extend:M (subreg:N (not:M (X:M)))) ->
    (xor:M (X:M, 0xffff))

Patch targets to handle code patterns like:
    not   a0,a0
    andi  a0,a0,0xff
to be optimized to:
    xori    a0,a0,255
Just to be clear we don't need to worry about paradoxical subregs in this case. To have a paradoxical N would have to be larger than M which would be invalid for the zero_extend expression.

But as your implementation shows, we do need to be somewhat careful about bits in X outside of mode M that may be set. Concretely let's take a HImode inner object with a QImode narrowing subreg

(zero_extend:HI (subreg:QI (not:HI (X:HI)))) where X has the value 0x8000.

That expression says unambiguously that the result is 0xff, even though the subreg indicates bits outside QI are don't cares, the outer zero_extend sets them to 0. If we naively changed this to the xor X, 0xff form we'd generate 0x80ff, which would be incorrect.




Change was locally tested for x86_64 and AArch64 (as most common)
and for RV-64 and MIPS-32 targets (as having an effect from this
optimization):
no regressions for all cases.

gcc/ChangeLog:
      * simplify-rtx.cc (simplify_context::simplify_unary_operation_1):
      Simplify ZERO_EXTEND (SUBREG (NOT X)) to XOR (X, 0xff...f) when X
      doesn't have any non-zero bits outside of SUBREG mode.

gcc/testsuite/ChangeLog:
      * gcc.target/riscv/pr112398.c: New test.

Signed-off-by: Alexey Merzlyakov <alexey.merzlya...@samsung.com>
---
   gcc/simplify-rtx.cc                       | 23 +++++++++++++++++++++++
   gcc/testsuite/gcc.target/riscv/pr112398.c | 14 ++++++++++++++
   2 files changed, 37 insertions(+)
   create mode 100644 gcc/testsuite/gcc.target/riscv/pr112398.c

diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc
index 2c04ce960ee..608ecedbcb8 100644
--- a/gcc/simplify-rtx.cc
+++ b/gcc/simplify-rtx.cc
@@ -1842,6 +1842,29 @@ simplify_context::simplify_unary_operation_1
(rtx_code code, machine_mode mode,
             & ~GET_MODE_MASK (op_mode)) == 0)
       return SUBREG_REG (op);

+      /* Trying to optimize:
+     (zero_extend:M (subreg:N (not:M (X:M)))) ->
+     (xor:M (zero_extend:M (subreg:N (X:M)), 0xffff))
+     where mask takes 0xffff bits of N mode bitsize.
"where the mask is GET_MODE_MASK (N)" is probably clearer to folks that have been working on GCC for a while. GET_MODE_MASK is the bits set in mode N. So for QI -> 0xff, HI would be 0xffff and so-on.


+     For the cases when X:M doesn't have any non-zero bits
+     outside of mode N, (zero_extend:M (subreg:N (X:M))
+     will be simplified to just (X:M)
+     and whole optimization will be -> (xor:M (X:M), 0xffff). */
+      if (GET_CODE (op) == SUBREG
Write this as "SUBREG_P (op)"

+      && GET_CODE (XEXP (op, 0)) == NOT
+      && GET_MODE (XEXP (op, 0)) == mode
+      && GET_MODE (XEXP (XEXP (op, 0), 0)) == mode
I suspect the last mode check is redundant. We're not supposed to have the argument of a code like NOT have a different mode than the code.


+      && subreg_lowpart_p (op)
Normally you'd probably need to check !paradoxical_subreg_p as well, but that case can't happen here because the mode of the outer zero_extend and thus the inner not must be wider than the mode of the subreg.


+      && (nonzero_bits (XEXP (XEXP (op, 0), 0), mode)
+          & ~GET_MODE_MASK (GET_MODE (XEXP (XEXP (op, 0), 0)))) == 0)
Isn't GET_MODE (XEXP (XEXP (op, 0), 0)) the same as "mode"? Use the value in the local variable, it's easier to read IMHO.

I suspect the formatting is off as well.  We format conditionals like this:

if (a
    && b
    && c)

Note how the "&&" lines up under the first argument to the IF.


+      {
+    const uint64_t mask
+      = ~((uint64_t)~0 << GET_MODE_BITSIZE (GET_MODE (op)).coeffs[0]);
Probably shouldn't be looking directly at .coeffs field. Instead there are methods to convert that value to an integer. Look for the "to_constant ()" method. so
GET_MODE_SIZE (GET_MODE (op)).to_constant ()

Note that this transformation may not work for modes with nonconstant sizes. So you might need to check the is_constant () method.

+    return simplify_gen_binary (XOR, mode,
+                    XEXP (XEXP (op, 0), 0),
+                    gen_int_mode (mask, mode));
Formatting looks goofy here too.

Line up each argument under the argument in the prior line ie

  foo (XOR, mode
       XEXP (XEXP (op, 0), 0)
       get_int_mode (mask, mode))


Overall it looks pretty good, but there are a few things you need to fix up and repost.


Jeff

Reply via email to