On Sat, Aug 20, 2011 at 11:52 PM, Richard Henderson <r...@redhat.com> wrote:
> On 08/20/2011 02:16 PM, Uros Bizjak wrote:
>> +(define_insn "bmi2_umul<mode><dwi>3_1"
>> +  [(set (match_operand:<DWI> 0 "register_operand" "=r")
>> +     (mult:<DWI>
>> +       (zero_extend:<DWI>
>> +         (match_operand:DWIH 1 "nonimmediate_operand" "%d"))
>> +       (zero_extend:<DWI>
>> +         (match_operand:DWIH 2 "nonimmediate_operand" "rm"))))]
>> +  "TARGET_BMI
>> +   && !(MEM_P (operands[1]) && MEM_P (operands[2]))"
>> +  "mulx\t{%2, %M0, %N0|%N0, %M0, %2}"
>> +  [(set_attr "type" "imul")
>> +   (set_attr "prefix" "vex")
>> +   (set_attr "mode" "<MODE>")])
>
> You can do better than this, and avoid the %M %N specifiers.
> The outputs are truly independent and do not need to be a pair.
>
> See the mn10300 umulsidi3{,_internal} patterns.

I have tried your suggestion, using patterns like following:

(define_insn "umulsidi3_1"
  [(set (match_operand:SI 0 "register_operand" "=a,r")
        (mult:SI
          (match_operand:SI 2 "nonimmediate_operand" "%0,d")
          (match_operand:SI 3 "nonimmediate_operand" "rm,rm")))
   (set (match_operand:SI 1 "register_operand" "=d,r")
        (truncate:SI
          (lshiftrt:DI
            (mult:DI (zero_extend:DI (match_dup 2))
                     (zero_extend:DI (match_dup 3)))
            (const_int 32))))
   (clobber (reg:CC FLAGS_REG))]
  "!TARGET_64BIT
   && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
  "@
   mull\t%3
   #"
  [(set_attr "isa" "base,bmi2")
   (set_attr "type" "imul,imulx")
   (set_attr "length_immediate" "0,*")
   (set (attr "athlon_decode")
        (cond [(eq_attr "alternative" "0")
                 (if_then_else (eq_attr "cpu" "athlon")
                   (const_string "vector")
                   (const_string "double"))]
              (const_string "*")))
   (set_attr "amdfam10_decode" "double,*")
   (set_attr "bdver1_decode" "direct,*")
   (set_attr "prefix" "orig,vex")
   (set_attr "mode" "SI")])


The compiler works, for a couple of simple testcases it produces the
same code as with register pairs. However, there are a couple of
problems:

- various length calculations look into operand{0,1,2} to determine
instruction length. This is fixable with a little effort.

- patterns that include (const_int N) do not macroize and this leads
to pattern explosion. For this simple example, in addition to
splitting out  any_extend pattern, we have to split also DWIH
patterns.

In the past, I have tried to use match_operand with const_int INTVAL
predicates, but gcc crashed elsewhere due to additional operand.
Please see [1].

IMO, it is currently too much pain to implement splitted pairs in
existing patterns for too low gain. I will however implement split to
mulx pattern after reload to proposed pattern to avoid %M %N.

[1] http://gcc.gnu.org/ml/gcc/2010-07/msg00143.html

Uros.

Reply via email to