Thanks for doing this. Just a couple of comments about the .md stuff: Michael Collison <michael.colli...@arm.com> writes: > diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md > index 5adc5ed..c6ae670 100644 > --- a/gcc/config/aarch64/aarch64.md > +++ b/gcc/config/aarch64/aarch64.md > @@ -3999,6 +3999,92 @@ > } > ) > > +;; When the LSL, LSR, ASR, ROR instructions operate on all register arguments > +;; they truncate the shift/rotate amount by the size of the registers they > +;; operate on: 32 for W-regs, 63 for X-regs. This allows us to optimise away > +;; such redundant masking instructions. GCC can do that automatically when > +;; SHIFT_COUNT_TRUNCATED is true, but we can't enable it for TARGET_SIMD > +;; because some of the SISD shift alternatives don't perform this > truncations. > +;; So this pattern exists to catch such cases. > + > +(define_insn "*aarch64_<optab>_reg_<mode>3_mask1" > + [(set (match_operand:GPI 0 "register_operand" "=r") > + (SHIFT:GPI > + (match_operand:GPI 1 "register_operand" "r") > + (subreg:QI (and:GPI > + (match_operand:GPI 2 "register_operand" "r") > + (match_operand 3 "const_int_operand" "n")) 0)))] > + "(~INTVAL (operands[3]) & (GET_MODE_BITSIZE (<MODE>mode)-1)) == 0" > + "<shift>\t%<w>0, %<w>1, %<w>2" > + [(set_attr "type" "shift_reg")] > +)
(subreg:QI (...) 0) is only correct for little endian. For big endian it needs to be 3 for SI or 7 for DI. You could probably handle that using something like: (match_operator:QI 2 "lowpart_subreg" [(and:GPI ...)]) and defining a lowpart_subreg predicate that checks the SUBREG_BYTE. Or just leave the subreg as-is and add !BYTES_BIG_ENDIAN to the insn condition. > +(define_insn_and_split "*aarch64_reg_<mode>3_neg_mask2" > + [(set (match_operand:GPI 0 "register_operand" "=r") > + (SHIFT:GPI > + (match_operand:GPI 1 "register_operand" "r") > + (subreg:QI (neg:SI (and:SI > + (match_operand:SI 2 "register_operand" "r") > + (match_operand 3 "const_int_operand" "n"))) 0)))] > + "((~INTVAL (operands[3]) & (GET_MODE_BITSIZE (<MODE>mode)-1)) == 0) > + && can_create_pseudo_p ()" > + "#" > + "&& true" > + [(const_int 0)] > + { > + rtx tmp = gen_reg_rtx (SImode); > + > + emit_insn (gen_negsi2 (tmp, operands[2])); > + rtx tmp2 = simplify_gen_subreg (QImode, tmp, SImode, 0); > + emit_insn (gen_<optab><mode>3 (operands[0], operands[1], tmp2)); > + DONE; > + } > +) Insn patterns shouldn't check can_create_pseudo_p, because there's no guarantee that the associated split happens before RA. In this case it should be safe to reuse operand 0 after RA if you change it to: [(set (match_operand:GPI 0 "register_operand" "=&r") ...)] and then: rtx tmp = (can_create_pseudo_p () ? gen_reg_rtx (SImode) : gen_lowpart (SImode, operands[0])); Thanks, Richard