On Thu, Sep 13, 2012 at 11:25:42AM -0700, Richard Henderson wrote:
> (2) It's not the best match if we were to extend these builtins to FMA4.
> There we really do have 4 inputs. Thus
How could you extend these builtins to FMA4 BTW? Doesn't FMA4 zero up the
high elements? In that case you'd need to expand it as copy of the X
operand register to DEST, doing vfmadd{ss,sd} to a temp register and
followed by vmovss/vmovsd instruction.
> (define_insn "*fmai_fmadd_<mode>_4"
> [(set (match_operand:VF_128 0 "register_operand" "=x,x")
> (vec_merge:VF_128
> (fma:VF_128
> (match_operand:VF_128 1 "nonimmediate_operand" "%x,x")
> (match_operand:VF_128 2 "nonimmediate_operand" " x,m")
> (match_operand:VF_128 3 "nonimmediate_operand" "xm,x"))
> (match_operand:VF_128 4 "register_operand" "0,0")
> (const_int 1)))]
> "TARGET_FMA4"
> "vfmadd<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
> [(set_attr "type" "ssemuladd")
> (set_attr "mode" "<MODE>")])
Jakub