Updated the patch per Richard's suggestions to allow scheduling of instructions 
before reload.

Bootstrapped and tested on aarch64-linux-gnu. Okay for trunk?

2017-05-22  Kyrylo Tkachov  <kyrylo.tkac...@arm.com>
            Michael Collison <michael.colli...@arm.com>

        PR target/70119
        * config/aarch64/aarch64.md (*aarch64_<optab>_reg_<mode>3_mask1):
        New pattern.
        (*aarch64_reg_<mode>3_neg_mask2): New pattern.
        (*aarch64_reg_<mode>3_minus_mask): New pattern.
        (*aarch64_<optab>_reg_di3_mask2): New pattern.
        * config/aarch64/aarch64.c (aarch64_rtx_costs): Account for cost
        of shift when the shift amount is masked with constant equal to
        the size of the mode.
        * config/aarch64/predicates.md (subreg_lowpart_operator): New
        predicate.


2016-05-22  Kyrylo Tkachov  <kyrylo.tkac...@arm.com>
            Michael Collison <michael.colli...@arm.com>

        PR target/70119
        * gcc.target/aarch64/var_shift_mask_1.c: New test.

-----Original Message-----
From: Richard Sandiford [mailto:richard.sandif...@linaro.org] 
Sent: Thursday, June 15, 2017 6:40 AM
To: Michael Collison <michael.colli...@arm.com>
Cc: Wilco Dijkstra <wilco.dijks...@arm.com>; Christophe Lyon 
<christophe.l...@linaro.org>; GCC Patches <gcc-patches@gcc.gnu.org>; nd 
<n...@arm.com>
Subject: Re: [PATCH] [Aarch64] Variable shift count truncation issues

Michael Collison <michael.colli...@arm.com> writes:
> +(define_insn_and_split "*aarch64_reg_<mode>3_neg_mask2"
> +  [(set (match_operand:GPI 0 "register_operand" "=r")
> +     (SHIFT:GPI
> +       (match_operand:GPI 1 "register_operand" "r")
> +       (match_operator 4 "subreg_lowpart_operator"
> +       [(neg:SI (and:SI (match_operand:SI 2 "register_operand" "r")
> +                        (match_operand 3 "const_int_operand" "n")))])))
> +   (clobber (match_scratch:SI 5 "=&r"))]
> +  "((~INTVAL (operands[3]) & (GET_MODE_BITSIZE (<MODE>mode)-1)) == 0)"
> +  "#"
> +  "&& reload_completed"
> +  [(const_int 0)]
> +  {
> +    emit_insn (gen_negsi2 (operands[5], operands[2]));
> +
> +    rtx and_op = gen_rtx_AND (SImode, operands[5], operands[3]);
> +    rtx subreg_tmp = gen_rtx_SUBREG (GET_MODE (operands[4]), and_op,
> +                                  SUBREG_BYTE (operands[4]));
> +    emit_insn (gen_<optab><mode>3 (operands[0], operands[1], subreg_tmp));
> +    DONE;
> +  }
> +)

Thanks, I agree this looks correct from the split/reload_completed POV.
I think we can go one better though, either:

(a) Still allow the split when !reload_completed, and use:

     if (GET_MODE (operands[5]) == SCRATCH)
       operands[5] = gen_reg_rtx (SImode);

    This will allow the individual instructions to be scheduled by sched1.

(b) Continue to restrict the split to reload_completed, change operand 0
    to =&r so that it can be used as a temporary, and drop operand 5 entirely.

Or perhaps do both:

(define_insn_and_split "*aarch64_reg_<mode>3_neg_mask2"
  [(set (match_operand:GPI 0 "register_operand" "=&r")
        (SHIFT:GPI
          (match_operand:GPI 1 "register_operand" "r")
          (match_operator 4 "subreg_lowpart_operator"
          [(neg:SI (and:SI (match_operand:SI 2 "register_operand" "r")
                           (match_operand 3 "const_int_operand" "n")))])))]
  "((~INTVAL (operands[3]) & (GET_MODE_BITSIZE (<MODE>mode)-1)) == 0)"
  "#"
  "&& 1"
  [(const_int 0)]
  {
    rtx tmp = (can_create_pseudo_p () ? gen_reg_rtx (<GPI:MODE>mode)
               : operands[0]);
    emit_insn (gen_negsi2 (tmp, operands[2]));

    rtx and_op = gen_rtx_AND (SImode, tmp, operands[3]);
    rtx subreg_tmp = gen_rtx_SUBREG (GET_MODE (operands[4]), and_op,
                                     SUBREG_BYTE (operands[4]));
    emit_insn (gen_<optab><mode>3 (operands[0], operands[1], subreg_tmp));
    DONE;
  }
)

Sorry for the run-around.  I should have realised earlier that these patterns 
didn't really need a distinct register after RA.

Thanks,
Richard

Attachment: pr5546v5.patch
Description: pr5546v5.patch

Reply via email to