Michael Collison <michael.colli...@arm.com> writes:
> +(define_insn_and_split "*aarch64_reg_<mode>3_neg_mask2"
> +  [(set (match_operand:GPI 0 "register_operand" "=r")
> +     (SHIFT:GPI
> +       (match_operand:GPI 1 "register_operand" "r")
> +       (match_operator 4 "subreg_lowpart_operator"
> +       [(neg:SI (and:SI (match_operand:SI 2 "register_operand" "r")
> +                        (match_operand 3 "const_int_operand" "n")))])))
> +   (clobber (match_scratch:SI 5 "=&r"))]
> +  "((~INTVAL (operands[3]) & (GET_MODE_BITSIZE (<MODE>mode)-1)) == 0)"
> +  "#"
> +  "&& reload_completed"
> +  [(const_int 0)]
> +  {
> +    emit_insn (gen_negsi2 (operands[5], operands[2]));
> +
> +    rtx and_op = gen_rtx_AND (SImode, operands[5], operands[3]);
> +    rtx subreg_tmp = gen_rtx_SUBREG (GET_MODE (operands[4]), and_op,
> +                                  SUBREG_BYTE (operands[4]));
> +    emit_insn (gen_<optab><mode>3 (operands[0], operands[1], subreg_tmp));
> +    DONE;
> +  }
> +)

Thanks, I agree this looks correct from the split/reload_completed POV.
I think we can go one better though, either:

(a) Still allow the split when !reload_completed, and use:

     if (GET_MODE (operands[5]) == SCRATCH)
       operands[5] = gen_reg_rtx (SImode);

    This will allow the individual instructions to be scheduled by sched1.

(b) Continue to restrict the split to reload_completed, change operand 0
    to =&r so that it can be used as a temporary, and drop operand 5 entirely.

Or perhaps do both:

(define_insn_and_split "*aarch64_reg_<mode>3_neg_mask2"
  [(set (match_operand:GPI 0 "register_operand" "=&r")
        (SHIFT:GPI
          (match_operand:GPI 1 "register_operand" "r")
          (match_operator 4 "subreg_lowpart_operator"
          [(neg:SI (and:SI (match_operand:SI 2 "register_operand" "r")
                           (match_operand 3 "const_int_operand" "n")))])))]
  "((~INTVAL (operands[3]) & (GET_MODE_BITSIZE (<MODE>mode)-1)) == 0)"
  "#"
  "&& 1"
  [(const_int 0)]
  {
    rtx tmp = (can_create_pseudo_p () ? gen_reg_rtx (<GPI:MODE>mode)
               : operands[0]);
    emit_insn (gen_negsi2 (tmp, operands[2]));

    rtx and_op = gen_rtx_AND (SImode, tmp, operands[3]);
    rtx subreg_tmp = gen_rtx_SUBREG (GET_MODE (operands[4]), and_op,
                                     SUBREG_BYTE (operands[4]));
    emit_insn (gen_<optab><mode>3 (operands[0], operands[1], subreg_tmp));
    DONE;
  }
)

Sorry for the run-around.  I should have realised earlier that these
patterns didn't really need a distinct register after RA.

Thanks,
Richard

Reply via email to