On Mon, May 6, 2024 at 11:24 PM Jeff Law <jeffreya...@gmail.com> wrote: > > > > On 5/6/24 2:40 PM, Christoph Müllner wrote: > > The combiner attempts to optimize a zero-extension of a logical right shift > > using zero_extract. We already utilize this optimization for those cases > > that result in a single instructions. Let's add a insn_and_split > > pattern that also matches the generic case, where we can emit an > > optimized sequence of a slli/srli. > > > > Tested with SPEC CPU 2017 (rv64gc). > > > > PR 111501 > > > > gcc/ChangeLog: > > > > * config/riscv/riscv.md (*lshr<GPR:mode>3_zero_extend_4): New > > pattern for zero-extraction. > > > > gcc/testsuite/ChangeLog: > > > > * gcc.target/riscv/pr111501.c: New test. > > * gcc.target/riscv/zero-extend-rshift-32.c: New test. > > * gcc.target/riscv/zero-extend-rshift-64.c: New test. > > * gcc.target/riscv/zero-extend-rshift.c: New test. > So I had Lyut looking in this space as well. Mostly because there's a > desire to avoid the srl+and approach and instead represent this stuff as > shifts (which are fusible in our uarch). SO I've already got some state... > > > > > > Signed-off-by: Christoph Müllner <christoph.muell...@vrull.eu> > > --- > > gcc/config/riscv/riscv.md | 30 +++++ > > gcc/testsuite/gcc.target/riscv/pr111501.c | 32 +++++ > > .../gcc.target/riscv/zero-extend-rshift-32.c | 37 ++++++ > > .../gcc.target/riscv/zero-extend-rshift-64.c | 63 ++++++++++ > > .../gcc.target/riscv/zero-extend-rshift.c | 119 ++++++++++++++++++ > > 5 files changed, 281 insertions(+) > > create mode 100644 gcc/testsuite/gcc.target/riscv/pr111501.c > > create mode 100644 gcc/testsuite/gcc.target/riscv/zero-extend-rshift-32.c > > create mode 100644 gcc/testsuite/gcc.target/riscv/zero-extend-rshift-64.c > > create mode 100644 gcc/testsuite/gcc.target/riscv/zero-extend-rshift.c > > > > diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md > > index d4676507b45..80cbecb78e8 100644 > > --- a/gcc/config/riscv/riscv.md > > +++ b/gcc/config/riscv/riscv.md > > @@ -2792,6 +2792,36 @@ (define_insn "*lshrsi3_zero_extend_3" > > [(set_attr "type" "shift") > > (set_attr "mode" "SI")]) > > > > +;; Canonical form for a zero-extend of a logical right shift. > > +;; Special cases are handled above. > > +;; Skip for single-bit extraction (Zbs/XTheadBs) and th.extu (XTheadBb) > > +(define_insn_and_split "*lshr<GPR:mode>3_zero_extend_4" > > + [(set (match_operand:GPR 0 "register_operand" "=r") > > + (zero_extract:GPR > > + (match_operand:GPR 1 "register_operand" " r") > > + (match_operand 2 "const_int_operand") > > + (match_operand 3 "const_int_operand"))) > > + (clobber (match_scratch:GPR 4 "=&r"))] > > + "!((TARGET_ZBS || TARGET_XTHEADBS) && (INTVAL (operands[2]) == 1)) > > + && !TARGET_XTHEADBB" > > + "#" > > + "&& reload_completed" > > + [(set (match_dup 4) > > + (ashift:GPR (match_dup 1) (match_dup 2))) > > + (set (match_dup 0) > > + (lshiftrt:GPR (match_dup 4) (match_dup 3)))] > Consider adding support for signed extractions as well. You just need > an iterator across zero_extract/sign_extract and suitable selection of > arithmetic vs logical right shift step.
The sign-extension/extraction code was worse than the zero-extension/extraction code. So, I ended up doing some initial work for addressing corner cases first, before converting this pattern using an any_extract iterator for the v2 (already on the list). > > A nit on the condition. Bring the && INTVAL (operands[2]) == 1 down to > a new line like you've gone with !TARGET_XTHEADBB. > > You also want to make sure the condition rejects the cases handled by > this pattern (or merge your pattern with this one): I kept the pattern, but added sign_extract support. > > > ;; Canonical form for a zero-extend of a logical right shift. > > (define_insn "*lshrsi3_zero_extend_2" > > [(set (match_operand:DI 0 "register_operand" "=r") > > (zero_extract:DI (match_operand:DI 1 "register_operand" " r") > > (match_operand 2 "const_int_operand") > > (match_operand 3 "const_int_operand")))] > > "(TARGET_64BIT && (INTVAL (operands[3]) > 0) > > && (INTVAL (operands[2]) + INTVAL (operands[3]) == 32))" > > { > > return "srliw\t%0,%1,%3"; > > } > > [(set_attr "type" "shift") > > (set_attr "mode" "SI")]) > > So generally going the right direction. But needs another iteration. Thanks for the review! > > Jeff >