On 5/6/24 13:40, Christoph Müllner wrote:
> The combiner attempts to optimize a zero-extension of a logical right shift
> using zero_extract. We already utilize this optimization for those cases
> that result in a single instructions. Let's add a insn_and_split
> pattern that also matches the generic case, where we can emit an
> optimized sequence of a slli/srli.
>
> ...
>
> diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
> index d4676507b45..80cbecb78e8 100644
> --- a/gcc/config/riscv/riscv.md
> +++ b/gcc/config/riscv/riscv.md
> @@ -2792,6 +2792,36 @@ (define_insn "*lshrsi3_zero_extend_3"
> [(set_attr "type" "shift")
> (set_attr "mode" "SI")])
>
> +;; Canonical form for a zero-extend of a logical right shift.
> +;; Special cases are handled above.
> +;; Skip for single-bit extraction (Zbs/XTheadBs) and th.extu (XTheadBb)
Dumb question: Why not for Zbs: Zb[abs] is going to be very common going
fwd and will end up being unused.
> +(define_insn_and_split "*lshr<GPR:mode>3_zero_extend_4"
> + [(set (match_operand:GPR 0 "register_operand" "=r")
> + (zero_extract:GPR
> + (match_operand:GPR 1 "register_operand" " r")
> + (match_operand 2 "const_int_operand")
> + (match_operand 3 "const_int_operand")))
> + (clobber (match_scratch:GPR 4 "=&r"))]
> + "!((TARGET_ZBS || TARGET_XTHEADBS) && (INTVAL (operands[2]) == 1))
> + && !TARGET_XTHEADBB"
> + "#"
> + "&& reload_completed"
> + [(set (match_dup 4)
> + (ashift:GPR (match_dup 1) (match_dup 2)))
> + (set (match_dup 0)
> + (lshiftrt:GPR (match_dup 4) (match_dup 3)))]
> +{
> + int regbits = GET_MODE_BITSIZE (GET_MODE (operands[0])).to_constant ();
> + int sizebits = INTVAL (operands[2]);
> + int startbits = INTVAL (operands[3]);
> + int lshamt = regbits - sizebits - startbits;
> + int rshamt = lshamt + startbits;
> + operands[2] = GEN_INT (lshamt);
> + operands[3] = GEN_INT (rshamt);
> +}
> + [(set_attr "type" "shift")
> + (set_attr "mode" "<GPR:MODE>")])
> +
> ;; Handle AND with 2^N-1 for N from 12 to XLEN. This can be split into
> ;; two logical shifts. Otherwise it requires 3 instructions: lui,
> ;; xor/addi/srli, and.
> diff --git a/gcc/testsuite/gcc.target/riscv/pr111501.c
> b/gcc/testsuite/gcc.target/riscv/pr111501.c
> new file mode 100644
> index 00000000000..9355be242e7
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/pr111501.c
> @@ -0,0 +1,32 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target rv64 } */
> +/* { dg-options "-march=rv64gc" { target { rv64 } } } */
> +/* { dg-skip-if "" { *-*-* } {"-O0" "-Os" "-Og" "-Oz" "-flto" } } */
> +/* { dg-final { check-function-bodies "**" "" } } */
Is function body check really needed: isn't count of srli and slli each
sufficient ?
Last year we saw a lot of false failures due to unrelated scheduling
changes as such tripping these up.
> +/* { dg-allow-blank-lines-in-output 1 } */
> +
> +/*
> +**do_shift:
> +** ...
> +** slli\ta[0-9],a[0-9],16
> +** srli\ta[0-9],a[0-9],48
> +** ...
> +*/
> +unsigned int
> +do_shift(unsigned long csum)
> +{
> + return (unsigned short)(csum >> 32);
> +}
> +
> +/*
> +**do_shift2:
> +** ...
> +** slli\ta[0-9],a[0-9],16
> +** srli\ta[0-9],a[0-9],48
> +** ...
> +*/
> +unsigned int
> +do_shift2(unsigned long csum)
> +{
> + return (csum << 16) >> 48;
> +}
> diff --git a/gcc/testsuite/gcc.target/riscv/zero-extend-rshift-32.c
> b/gcc/testsuite/gcc.target/riscv/zero-extend-rshift-32.c
> new file mode 100644
> index 00000000000..2824d6fe074
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/zero-extend-rshift-32.c
> @@ -0,0 +1,37 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target rv32 } */
> +/* { dg-options "-march=rv32gc" } */
> +/* { dg-skip-if "" { *-*-* } {"-O0" "-Os" "-Og" "-Oz" "-flto" } } */
> +/* { dg-final { check-function-bodies "**" "" } } */
Same as above, counts where possible.
-Vineet