Re: [PATCH] [Aarch64] Variable shift count truncation issues

Richard Sandiford Fri, 19 May 2017 00:21:27 -0700

Thanks for doing this.  Just a couple of comments about the .md stuff:

Michael Collison <michael.colli...@arm.com> writes:
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index 5adc5ed..c6ae670 100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -3999,6 +3999,92 @@
>    }
>  )
>  
> +;; When the LSL, LSR, ASR, ROR instructions operate on all register arguments
> +;; they truncate the shift/rotate amount by the size of the registers they
> +;; operate on: 32 for W-regs, 63 for X-regs.  This allows us to optimise away
> +;; such redundant masking instructions.  GCC can do that automatically when
> +;; SHIFT_COUNT_TRUNCATED is true, but we can't enable it for TARGET_SIMD
> +;; because some of the SISD shift alternatives don't perform this 
> truncations.
> +;; So this pattern exists to catch such cases.
> +
> +(define_insn "*aarch64_<optab>_reg_<mode>3_mask1"
> +  [(set (match_operand:GPI 0 "register_operand" "=r")
> +     (SHIFT:GPI
> +       (match_operand:GPI 1 "register_operand" "r")
> +       (subreg:QI (and:GPI
> +                   (match_operand:GPI 2 "register_operand" "r")
> +                   (match_operand 3 "const_int_operand" "n")) 0)))]
> +  "(~INTVAL (operands[3]) & (GET_MODE_BITSIZE (<MODE>mode)-1)) == 0"
> +  "<shift>\t%<w>0, %<w>1, %<w>2"
> +  [(set_attr "type" "shift_reg")]
> +)


(subreg:QI (...) 0) is only correct for little endian.  For big endian
it needs to be 3 for SI or 7 for DI.  You could probably handle that
using something like:

  (match_operator:QI 2 "lowpart_subreg"
    [(and:GPI ...)])

and defining a lowpart_subreg predicate that checks the SUBREG_BYTE.
Or just leave the subreg as-is and add !BYTES_BIG_ENDIAN to the insn
condition.

> +(define_insn_and_split "*aarch64_reg_<mode>3_neg_mask2"
> +  [(set (match_operand:GPI 0 "register_operand" "=r")
> +     (SHIFT:GPI
> +       (match_operand:GPI 1 "register_operand" "r")
> +       (subreg:QI (neg:SI (and:SI
> +                           (match_operand:SI 2 "register_operand" "r")
> +                           (match_operand 3 "const_int_operand" "n"))) 0)))]
> +  "((~INTVAL (operands[3]) & (GET_MODE_BITSIZE (<MODE>mode)-1)) == 0)
> +   && can_create_pseudo_p ()"
> +  "#"
> +  "&& true"
> +  [(const_int 0)]
> +  {
> +    rtx tmp = gen_reg_rtx (SImode);
> +
> +    emit_insn (gen_negsi2 (tmp, operands[2]));
> +    rtx tmp2 = simplify_gen_subreg (QImode, tmp, SImode, 0);
> +    emit_insn (gen_<optab><mode>3 (operands[0], operands[1], tmp2));
> +    DONE;
> +  }
> +)

Insn patterns shouldn't check can_create_pseudo_p, because there's no
guarantee that the associated split happens before RA.  In this case it
should be safe to reuse operand 0 after RA if you change it to:

  [(set (match_operand:GPI 0 "register_operand" "=&r")
        ...)]

and then:

  rtx tmp = (can_create_pseudo_p ()
             ? gen_reg_rtx (SImode)
             : gen_lowpart (SImode, operands[0]));

Thanks,
Richard

Re: [PATCH] [Aarch64] Variable shift count truncation issues

Reply via email to