Hi Segher, On 22/10/2020 15:39, Segher Boessenkool wrote: > On Thu, Oct 15, 2020 at 09:59:24AM +0100, Alex Coplan wrote: > > Currently, make_extraction() identifies where we can emit an ASHIFT of > > an extend in place of an extraction, but fails to make the corresponding > > canonicalization/simplification when presented with a MULT by a power of > > two. Such a representation is canonical when representing a left-shifted > > address inside a MEM. > > > > This patch remedies this situation: after the patch, make_extraction() > > now also identifies RTXs such as: > > > > (mult:DI (subreg:DI (reg:SI r)) (const_int 2^n)) > > > > and rewrites this as: > > > > (mult:DI (sign_extend:DI (reg:SI r)) (const_int 2^n)) > > > > instead of using a sign_extract. > > That is only correct if SUBREG_PROMOTED_VAR_P is true and > SUBREG_PROMOTED_UNSIGNED_P is false for r. Is that guaranteed to be > true here (and how then?)
Sorry, I didn't give enough context here. For this subreg, SUBREG_PROMOTED_VAR_P is not set, so I agree that this transformation in isolation is not valid. The crucial piece of missing information is that we only make this transformation in calls to make_extraction where len = 32 + n and pos_rtx = pos = 0 (so we're extracting the bottom 32 + n bits), and unsignedp is false (so we're doing a sign_extract). Below is a proposed commit message, updated with this information. OK for trunk? Thanks, Alex --- Currently, make_extraction() identifies where we can emit an ashift of an extend in place of an extraction, but fails to make the corresponding canonicalization/simplification when presented with a mult by a power of two. Such a representation is canonical when representing a left-shifted address inside a mem. This patch remedies this situation. For rtxes such as: (mult:DI (subreg:DI (reg:SI r) 0) (const_int 2^n)) where the bottom 32 + n bits are valid (the higher-order bits are undefined) and make_extraction() is being asked to sign_extract the lower (valid) bits, after the patch, we rewrite this as: (mult:DI (sign_extend:DI (reg:SI r)) (const_int 2^n)) instead of using a sign_extract. (This patch also fixes up a comment in expand_compound_operation() which appears to have suffered from bitrot.) For an example of the existing behavior in the ashift case, compiling the following C testcase at -O2 on AArch64: int h(void); struct c d; struct c { int e[1]; }; void g(void) { int k = 0; for (;; k = h()) { asm volatile ("" :: "r"(&d.e[k])); } } make_extraction gets called with len=34, pos_rtx=pos=0, unsignedp=false (so we're sign_extracting the bottom 34 bits), and the following rtx in inner: (ashift:DI (subreg:DI (reg/v:SI 93 [ k ]) 0) (const_int 2 [0x2])) where it is clear that the bottom 34 bits are valid (and the higher-order bits are undefined). We then hit the block: else if (GET_CODE (inner) == ASHIFT && CONST_INT_P (XEXP (inner, 1)) && pos_rtx == 0 && pos == 0 && len > UINTVAL (XEXP (inner, 1))) { /* We're extracting the least significant bits of an rtx (ashift X (const_int C)), where LEN > C. Extract the least significant (LEN - C) bits of X, giving an rtx whose mode is MODE, then shift it left C times. */ new_rtx = make_extraction (mode, XEXP (inner, 0), 0, 0, len - INTVAL (XEXP (inner, 1)), unsignedp, in_dest, in_compare); if (new_rtx != 0) return gen_rtx_ASHIFT (mode, new_rtx, XEXP (inner, 1)); } and the recursive call to make_extraction() is asked to sign_extract the bottom (LEN - C) = 32 bits of: (subreg:DI (reg/v:SI 93) 0) which gives us: (sign_extend:DI (reg/v:SI 93 [ k ])). The gen_rtx_ASHIFT call then gives us the final result: (ashift:DI (sign_extend:DI (reg/v:SI 93 [ k ])) (const_int 2 [0x2])). Now all that this patch does is to teach the block that looks for these shifts how to handle the same thing written as a mult instead of an ashift. In particular, for the testcase in the PR (96998), we hit make_extraction with len=34, unsignedp=false, pos_rtx=pos=0, and inner as: (mult:DI (subreg:DI (reg/v:SI 92 [ g ]) 0) (const_int 4 [0x4])) It should be clear from the above that this can be handled in an analogous way: the recursive case is precisely the same, the only difference is that we take the log2 of the shift amount and write the end result as a mult instead. This fixes several quality regressions on AArch64 after removing support for addresses represented as sign_extract insns (1/2). In particular, after the fix for PR96998, for the relevant testcase, we have: .L2: sxtw x0, w0 // 8 [c=4 l=4] *extendsidi2_aarch64/0 add x0, x19, x0, lsl 2 // 39 [c=8 l=4] *add_lsl_di bl h // 11 [c=4 l=4] *call_value_insn/1 b .L2 // 54 [c=4 l=4] jump and after this patch, we have: .L2: add x0, x19, w0, sxtw 2 // 39 [c=8 l=4] *add_extendsi_shft_di bl h // 11 [c=4 l=4] *call_value_insn/1 b .L2 // 54 [c=4 l=4] jump which restores the original optimal sequence we saw before the AArch64 patch. --- gcc/ChangeLog: * combine.c (expand_compound_operation): Tweak variable name in comment to match source. (make_extraction): Handle mult by power of two in addition to ashift.
diff --git a/gcc/combine.c b/gcc/combine.c index c88382efbd3..fe8eff2b464 100644 --- a/gcc/combine.c +++ b/gcc/combine.c @@ -7419,8 +7419,8 @@ expand_compound_operation (rtx x) } /* If we reach here, we want to return a pair of shifts. The inner - shift is a left shift of BITSIZE - POS - LEN bits. The outer - shift is a right shift of BITSIZE - LEN bits. It is arithmetic or + shift is a left shift of MODEWIDTH - POS - LEN bits. The outer + shift is a right shift of MODEWIDTH - LEN bits. It is arithmetic or logical depending on the value of UNSIGNEDP. If this was a ZERO_EXTEND or ZERO_EXTRACT, this pair of shifts will be @@ -7650,20 +7650,27 @@ make_extraction (machine_mode mode, rtx inner, HOST_WIDE_INT pos, is_mode = GET_MODE (SUBREG_REG (inner)); inner = SUBREG_REG (inner); } - else if (GET_CODE (inner) == ASHIFT + else if ((GET_CODE (inner) == ASHIFT || GET_CODE (inner) == MULT) && CONST_INT_P (XEXP (inner, 1)) - && pos_rtx == 0 && pos == 0 - && len > UINTVAL (XEXP (inner, 1))) - { - /* We're extracting the least significant bits of an rtx - (ashift X (const_int C)), where LEN > C. Extract the - least significant (LEN - C) bits of X, giving an rtx - whose mode is MODE, then shift it left C times. */ - new_rtx = make_extraction (mode, XEXP (inner, 0), - 0, 0, len - INTVAL (XEXP (inner, 1)), - unsignedp, in_dest, in_compare); - if (new_rtx != 0) - return gen_rtx_ASHIFT (mode, new_rtx, XEXP (inner, 1)); + && pos_rtx == 0 && pos == 0) + { + const HOST_WIDE_INT ci = INTVAL (XEXP (inner, 1)); + const auto code = GET_CODE (inner); + const HOST_WIDE_INT shift_amt = (code == MULT) ? exact_log2 (ci) : ci; + + if (shift_amt > 0 && len > (unsigned HOST_WIDE_INT)shift_amt) + { + /* We're extracting the least significant bits of an rtx + (ashift X (const_int C)) or (mult X (const_int (2^C))), + where LEN > C. Extract the least significant (LEN - C) bits + of X, giving an rtx whose mode is MODE, then shift it left + C times. */ + new_rtx = make_extraction (mode, XEXP (inner, 0), + 0, 0, len - shift_amt, + unsignedp, in_dest, in_compare); + if (new_rtx) + return gen_rtx_fmt_ee (code, mode, new_rtx, XEXP (inner, 1)); + } } else if (GET_CODE (inner) == TRUNCATE /* If trying or potentionally trying to extract