Hi Segher,

On 22/10/2020 15:39, Segher Boessenkool wrote:
> On Thu, Oct 15, 2020 at 09:59:24AM +0100, Alex Coplan wrote:
> > Currently, make_extraction() identifies where we can emit an ASHIFT of
> > an extend in place of an extraction, but fails to make the corresponding
> > canonicalization/simplification when presented with a MULT by a power of
> > two. Such a representation is canonical when representing a left-shifted
> > address inside a MEM.
> > 
> > This patch remedies this situation: after the patch, make_extraction()
> > now also identifies RTXs such as:
> > 
> > (mult:DI (subreg:DI (reg:SI r)) (const_int 2^n))
> > 
> > and rewrites this as:
> > 
> > (mult:DI (sign_extend:DI (reg:SI r)) (const_int 2^n))
> > 
> > instead of using a sign_extract.
> 
> That is only correct if SUBREG_PROMOTED_VAR_P is true and
> SUBREG_PROMOTED_UNSIGNED_P is false for r.  Is that guaranteed to be
> true here (and how then?)

Sorry, I didn't give enough context here. For this subreg,
SUBREG_PROMOTED_VAR_P is not set, so I agree that this transformation in
isolation is not valid.

The crucial piece of missing information is that we only make this
transformation in calls to make_extraction where len = 32 + n and
pos_rtx = pos = 0 (so we're extracting the bottom 32 + n bits), and
unsignedp is false (so we're doing a sign_extract).

Below is a proposed commit message, updated with this information.

OK for trunk?

Thanks,
Alex

---

Currently, make_extraction() identifies where we can emit an ashift of
an extend in place of an extraction, but fails to make the corresponding
canonicalization/simplification when presented with a mult by a power of
two. Such a representation is canonical when representing a left-shifted
address inside a mem.

This patch remedies this situation. For rtxes such as:

(mult:DI (subreg:DI (reg:SI r) 0) (const_int 2^n))

where the bottom 32 + n bits are valid (the higher-order bits are
undefined) and make_extraction() is being asked to sign_extract the
lower (valid) bits, after the patch, we rewrite this as:

(mult:DI (sign_extend:DI (reg:SI r)) (const_int 2^n))

instead of using a sign_extract.

(This patch also fixes up a comment in expand_compound_operation() which
appears to have suffered from bitrot.)

For an example of the existing behavior in the ashift case, compiling
the following C testcase at -O2 on AArch64:

int h(void);
struct c d;
struct c {
    int e[1];
};

void g(void) {
  int k = 0;
  for (;; k = h()) {
    asm volatile ("" :: "r"(&d.e[k]));
  }
}

make_extraction gets called with len=34, pos_rtx=pos=0, unsignedp=false
(so we're sign_extracting the bottom 34 bits), and the following rtx in
inner:

(ashift:DI (subreg:DI (reg/v:SI 93 [ k ]) 0)
    (const_int 2 [0x2]))

where it is clear that the bottom 34 bits are valid (and the
higher-order bits are undefined). We then hit the block:

  else if (GET_CODE (inner) == ASHIFT
           && CONST_INT_P (XEXP (inner, 1))
           && pos_rtx == 0 && pos == 0
           && len > UINTVAL (XEXP (inner, 1)))
    {
      /* We're extracting the least significant bits of an rtx
         (ashift X (const_int C)), where LEN > C.  Extract the
         least significant (LEN - C) bits of X, giving an rtx
         whose mode is MODE, then shift it left C times.  */
      new_rtx = make_extraction (mode, XEXP (inner, 0),
                             0, 0, len - INTVAL (XEXP (inner, 1)),
                             unsignedp, in_dest, in_compare);
      if (new_rtx != 0)
        return gen_rtx_ASHIFT (mode, new_rtx, XEXP (inner, 1));
    }

and the recursive call to make_extraction() is asked to sign_extract the
bottom (LEN - C) = 32 bits of:

(subreg:DI (reg/v:SI 93) 0)

which gives us:

(sign_extend:DI (reg/v:SI 93 [ k ])). The gen_rtx_ASHIFT call then gives
us the final result:

(ashift:DI (sign_extend:DI (reg/v:SI 93 [ k ]))
    (const_int 2 [0x2])).

Now all that this patch does is to teach the block that looks for these
shifts how to handle the same thing written as a mult instead of an
ashift. In particular, for the testcase in the PR (96998), we hit
make_extraction with len=34, unsignedp=false, pos_rtx=pos=0, and inner
as:

(mult:DI (subreg:DI (reg/v:SI 92 [ g ]) 0)
    (const_int 4 [0x4]))

It should be clear from the above that this can be handled in an
analogous way: the recursive case is precisely the same, the only
difference is that we take the log2 of the shift amount and write the
end result as a mult instead.

This fixes several quality regressions on AArch64 after removing support
for addresses represented as sign_extract insns (1/2).

In particular, after the fix for PR96998, for the relevant testcase, we
have:

.L2:
        sxtw    x0, w0  // 8    [c=4 l=4]  *extendsidi2_aarch64/0
        add     x0, x19, x0, lsl 2      // 39   [c=8 l=4]  *add_lsl_di
        bl      h               // 11   [c=4 l=4]  *call_value_insn/1
        b       .L2             // 54   [c=4 l=4]  jump

and after this patch, we have:

.L2:
        add     x0, x19, w0, sxtw 2     // 39   [c=8 l=4]  *add_extendsi_shft_di
        bl      h               // 11   [c=4 l=4]  *call_value_insn/1
        b       .L2             // 54   [c=4 l=4]  jump

which restores the original optimal sequence we saw before the AArch64
patch.

---

gcc/ChangeLog:

        * combine.c (expand_compound_operation): Tweak variable name in
        comment to match source.
        (make_extraction): Handle mult by power of two in addition to
        ashift.
diff --git a/gcc/combine.c b/gcc/combine.c
index c88382efbd3..fe8eff2b464 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -7419,8 +7419,8 @@ expand_compound_operation (rtx x)
     }
 
   /* If we reach here, we want to return a pair of shifts.  The inner
-     shift is a left shift of BITSIZE - POS - LEN bits.  The outer
-     shift is a right shift of BITSIZE - LEN bits.  It is arithmetic or
+     shift is a left shift of MODEWIDTH - POS - LEN bits.  The outer
+     shift is a right shift of MODEWIDTH - LEN bits.  It is arithmetic or
      logical depending on the value of UNSIGNEDP.
 
      If this was a ZERO_EXTEND or ZERO_EXTRACT, this pair of shifts will be
@@ -7650,20 +7650,27 @@ make_extraction (machine_mode mode, rtx inner, 
HOST_WIDE_INT pos,
        is_mode = GET_MODE (SUBREG_REG (inner));
       inner = SUBREG_REG (inner);
     }
-  else if (GET_CODE (inner) == ASHIFT
+  else if ((GET_CODE (inner) == ASHIFT || GET_CODE (inner) == MULT)
           && CONST_INT_P (XEXP (inner, 1))
-          && pos_rtx == 0 && pos == 0
-          && len > UINTVAL (XEXP (inner, 1)))
-    {
-      /* We're extracting the least significant bits of an rtx
-        (ashift X (const_int C)), where LEN > C.  Extract the
-        least significant (LEN - C) bits of X, giving an rtx
-        whose mode is MODE, then shift it left C times.  */
-      new_rtx = make_extraction (mode, XEXP (inner, 0),
-                            0, 0, len - INTVAL (XEXP (inner, 1)),
-                            unsignedp, in_dest, in_compare);
-      if (new_rtx != 0)
-       return gen_rtx_ASHIFT (mode, new_rtx, XEXP (inner, 1));
+          && pos_rtx == 0 && pos == 0)
+    {
+      const HOST_WIDE_INT ci = INTVAL (XEXP (inner, 1));
+      const auto code = GET_CODE (inner);
+      const HOST_WIDE_INT shift_amt = (code == MULT) ? exact_log2 (ci) : ci;
+
+      if (shift_amt > 0 && len > (unsigned HOST_WIDE_INT)shift_amt)
+       {
+         /* We're extracting the least significant bits of an rtx
+            (ashift X (const_int C)) or (mult X (const_int (2^C))),
+            where LEN > C.  Extract the least significant (LEN - C) bits
+            of X, giving an rtx whose mode is MODE, then shift it left
+            C times.  */
+         new_rtx = make_extraction (mode, XEXP (inner, 0),
+                                0, 0, len - shift_amt,
+                                unsignedp, in_dest, in_compare);
+         if (new_rtx)
+           return gen_rtx_fmt_ee (code, mode, new_rtx, XEXP (inner, 1));
+       }
     }
   else if (GET_CODE (inner) == TRUNCATE
           /* If trying or potentionally trying to extract

Reply via email to