Hi, Thanks for the update. The patch mostly looks good, but one minor and one more substantial comment below.
BTW, the patch seems to have been corrupted en route, in that unchanged lines have too much space. Attaching is fine if that's easier. Dhruv Chawla <dhr...@nvidia.com> writes: > diff --git a/gcc/config/aarch64/aarch64-sve.md > b/gcc/config/aarch64/aarch64-sve.md > index a72ca2a500d..42802bac653 100644 > --- a/gcc/config/aarch64/aarch64-sve.md > +++ b/gcc/config/aarch64/aarch64-sve.md > @@ -4149,80 +4149,58 @@ > (define_expand "@aarch64_adr<mode>_shift" > [(set (match_operand:SVE_FULL_SDI 0 "register_operand") > (plus:SVE_FULL_SDI > - (unspec:SVE_FULL_SDI > - [(match_dup 4) > - (ashift:SVE_FULL_SDI > - (match_operand:SVE_FULL_SDI 2 "register_operand") > - (match_operand:SVE_FULL_SDI 3 "const_1_to_3_operand"))] > - UNSPEC_PRED_X) > + (ashift:SVE_FULL_SDI > + (match_operand:SVE_FULL_SDI 2 "register_operand") > + (match_operand:SVE_FULL_SDI 3 "const_1_to_3_operand")) > (match_operand:SVE_FULL_SDI 1 "register_operand")))] > "TARGET_SVE && TARGET_NON_STREAMING" > - { > - operands[4] = CONSTM1_RTX (<VPRED>mode); > - } > + {} > ) The {} can be removed. > [...] > @@ -4803,6 +4781,9 @@ > > ;; Unpredicated shift by a scalar, which expands into one of the vector > ;; shifts below. > +;; > +;; The unpredicated form is emitted only when the shift amount is a constant > +;; value that is valid for the shift being carried out. > (define_expand "<ASHIFT:optab><mode>3" > [(set (match_operand:SVE_I 0 "register_operand") > (ASHIFT:SVE_I > @@ -4810,20 +4791,29 @@ > (match_operand:<VEL> 2 "general_operand")))] > "TARGET_SVE" > { > - rtx amount; > + rtx amount = NULL_RTX; > if (CONST_INT_P (operands[2])) > { > - amount = gen_const_vec_duplicate (<MODE>mode, operands[2]); > - if (!aarch64_sve_<lr>shift_operand (operands[2], <MODE>mode)) > - amount = force_reg (<MODE>mode, amount); > + if (aarch64_simd_shift_imm_p (operands[2], <MODE>mode, <optab>_optab == > ashl_optab)) > + operands[2] = aarch64_simd_gen_const_vector_dup (<MODE>mode, INTVAL > (operands[2])); > + else > + { > + amount = gen_const_vec_duplicate (<MODE>mode, operands[2]); > + if (!aarch64_sve_<lr>shift_operand (operands[2], <MODE>mode)) > + amount = force_reg (<MODE>mode, amount); > + } > } > else > { > amount = convert_to_mode (<VEL>mode, operands[2], 0); > amount = expand_vector_broadcast (<MODE>mode, amount); > } > - emit_insn (gen_v<optab><mode>3 (operands[0], operands[1], amount)); > - DONE; > + > + if (amount) > + { > + emit_insn (gen_v<optab><mode>3 (operands[0], operands[1], amount)); > + DONE; > + } > } > ) > Instead of the two hunks above, I think we should leave <ASHIFT:optab><mode>3 alone and change v<optab><mode>3. This would involve changing: > @@ -4867,27 +4857,27 @@ > "" > ) > -;; Unpredicated shift operations by a constant (post-RA only). > +;; Unpredicated shift operations by a constant. > ;; These are generated by splitting a predicated instruction whose > ;; predicate is unused. > -(define_insn "*post_ra_v_ashl<mode>3" > +(define_insn "*v_ashl<mode>3" > [(set (match_operand:SVE_I 0 "register_operand") > (ashift:SVE_I > (match_operand:SVE_I 1 "register_operand") > (match_operand:SVE_I 2 "aarch64_simd_lshift_imm")))] > - "TARGET_SVE && reload_completed" > + "TARGET_SVE" > {@ [ cons: =0 , 1 , 2 ] > [ w , w , vs1 ] add\t%0.<Vetype>, %1.<Vetype>, %1.<Vetype> > [ w , w , Dl ] lsl\t%0.<Vetype>, %1.<Vetype>, #%2 > } > ) > -(define_insn "*post_ra_v_<optab><mode>3" > +(define_insn "*v_<optab><mode>3" > [(set (match_operand:SVE_I 0 "register_operand" "=w") > (SHIFTRT:SVE_I > (match_operand:SVE_I 1 "register_operand" "w") > (match_operand:SVE_I 2 "aarch64_simd_rshift_imm")))] > - "TARGET_SVE && reload_completed" > + "TARGET_SVE" > "<shift>\t%0.<Vetype>, %1.<Vetype>, #%2" > ) ...these instructions to named patterns, e.g. aarch64_vashl<mode>3_const and aarch64_v<optab><mode>3_const respectively (with no _ after v, for consistency with the optab name). Then v<optab><mode>3 could generate aarch64_v<optab><mode>3_const for the constant case. Thanks, Richard