On 5/6/24 13:40, Christoph Müllner wrote:
> The combiner attempts to optimize a zero-extension of a logical right shift
> using zero_extract. We already utilize this optimization for those cases
> that result in a single instructions.  Let's add a insn_and_split
> pattern that also matches the generic case, where we can emit an
> optimized sequence of a slli/srli.
>
> ...
>
> diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
> index d4676507b45..80cbecb78e8 100644
> --- a/gcc/config/riscv/riscv.md
> +++ b/gcc/config/riscv/riscv.md
> @@ -2792,6 +2792,36 @@ (define_insn "*lshrsi3_zero_extend_3"
>    [(set_attr "type" "shift")
>     (set_attr "mode" "SI")])
>  
> +;; Canonical form for a zero-extend of a logical right shift.
> +;; Special cases are handled above.
> +;; Skip for single-bit extraction (Zbs/XTheadBs) and th.extu (XTheadBb)

Dumb question: Why not for Zbs: Zb[abs] is going to be very common going
fwd and will end up being unused.

> +(define_insn_and_split "*lshr<GPR:mode>3_zero_extend_4"
> +  [(set (match_operand:GPR 0 "register_operand" "=r")
> +      (zero_extract:GPR
> +       (match_operand:GPR 1 "register_operand" " r")
> +       (match_operand     2 "const_int_operand")
> +       (match_operand     3 "const_int_operand")))
> +   (clobber (match_scratch:GPR  4 "=&r"))]
> +  "!((TARGET_ZBS || TARGET_XTHEADBS) && (INTVAL (operands[2]) == 1))
> +   && !TARGET_XTHEADBB"
> +  "#"
> +  "&& reload_completed"
> +  [(set (match_dup 4)
> +     (ashift:GPR (match_dup 1) (match_dup 2)))
> +   (set (match_dup 0)
> +     (lshiftrt:GPR (match_dup 4) (match_dup 3)))]
> +{
> +  int regbits = GET_MODE_BITSIZE (GET_MODE (operands[0])).to_constant ();
> +  int sizebits = INTVAL (operands[2]);
> +  int startbits = INTVAL (operands[3]);
> +  int lshamt = regbits - sizebits - startbits;
> +  int rshamt = lshamt + startbits;
> +  operands[2] = GEN_INT (lshamt);
> +  operands[3] = GEN_INT (rshamt);
> +}
> +  [(set_attr "type" "shift")
> +   (set_attr "mode" "<GPR:MODE>")])
> +
>  ;; Handle AND with 2^N-1 for N from 12 to XLEN.  This can be split into
>  ;; two logical shifts.  Otherwise it requires 3 instructions: lui,
>  ;; xor/addi/srli, and.
> diff --git a/gcc/testsuite/gcc.target/riscv/pr111501.c 
> b/gcc/testsuite/gcc.target/riscv/pr111501.c
> new file mode 100644
> index 00000000000..9355be242e7
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/pr111501.c
> @@ -0,0 +1,32 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target rv64 } */
> +/* { dg-options "-march=rv64gc" { target { rv64 } } } */
> +/* { dg-skip-if "" { *-*-* } {"-O0" "-Os" "-Og" "-Oz" "-flto" } } */
> +/* { dg-final { check-function-bodies "**" "" } } */

Is function body check really needed: isn't count of srli and slli each
sufficient ?
Last year we saw a lot of false failures due to unrelated scheduling
changes as such tripping these up.

> +/* { dg-allow-blank-lines-in-output 1 } */
> +
> +/*
> +**do_shift:
> +**    ...
> +**    slli\ta[0-9],a[0-9],16
> +**    srli\ta[0-9],a[0-9],48
> +**    ...
> +*/
> +unsigned int
> +do_shift(unsigned long csum)
> +{
> +  return (unsigned short)(csum >> 32);
> +}
> +
> +/*
> +**do_shift2:
> +**    ...
> +**    slli\ta[0-9],a[0-9],16
> +**    srli\ta[0-9],a[0-9],48
> +**    ...
> +*/
> +unsigned int
> +do_shift2(unsigned long csum)
> +{
> +  return (csum << 16) >> 48;
> +}
> diff --git a/gcc/testsuite/gcc.target/riscv/zero-extend-rshift-32.c 
> b/gcc/testsuite/gcc.target/riscv/zero-extend-rshift-32.c
> new file mode 100644
> index 00000000000..2824d6fe074
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/zero-extend-rshift-32.c
> @@ -0,0 +1,37 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target rv32 } */
> +/* { dg-options "-march=rv32gc" } */
> +/* { dg-skip-if "" { *-*-* } {"-O0" "-Os" "-Og" "-Oz" "-flto" } } */
> +/* { dg-final { check-function-bodies "**" "" } } */

Same as above, counts where possible.

-Vineet

Reply via email to