On Mon, Sep 19, 2016 at 05:43:19PM -0500, Segher Boessenkool wrote:
> On Mon, Sep 19, 2016 at 06:02:08PM -0400, Michael Meissner wrote:
> > vector float combine (float a, float b, float c, float d)
> > {
> >   return (vector float) { a, b, c, d };
> > }
> 
> [ ... ]
> 
> > However ISA 2.07 (i.e. power8) added the VMRGEW instruction, which can do 
> > this
> > more simply:
> > 
> >         xxpermdi 34,1,2,0
> >         xxpermdi 32,3,4,0
> >         xvcvdpsp 34,34
> >         xvcvdpsp 32,32
> >         vmrgew 2,2,0
> 
> This results in {a,c,b,d} instead?

Yes.

> > --- gcc/config/rs6000/rs6000.c      
> > (.../svn+ssh://meiss...@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000)    
> > (revision 240142)
> > +++ gcc/config/rs6000/rs6000.c      (.../gcc/config/rs6000) (working copy)
> > @@ -6821,11 +6821,26 @@ rs6000_expand_vector_init (rtx target, r
> >       rtx op2 = force_reg (SFmode, XVECEXP (vals, 0, 2));
> >       rtx op3 = force_reg (SFmode, XVECEXP (vals, 0, 3));
> >  
> > -     emit_insn (gen_vsx_concat_v2sf (dbl_even, op0, op1));
> > -     emit_insn (gen_vsx_concat_v2sf (dbl_odd, op2, op3));
> > -     emit_insn (gen_vsx_xvcvdpsp (flt_even, dbl_even));
> > -     emit_insn (gen_vsx_xvcvdpsp (flt_odd, dbl_odd));
> > -     rs6000_expand_extract_even (target, flt_even, flt_odd);
> > +     /* Use VMRGEW if we can instead of doing a permute.  */
> > +     if (TARGET_P8_VECTOR)
> > +       {
> > +         emit_insn (gen_vsx_concat_v2sf (dbl_even, op0, op2));
> > +         emit_insn (gen_vsx_concat_v2sf (dbl_odd, op1, op3));
> 
> But this looks correct, so just the example is pastoed?

Yes, I pasted the code for -mcpu=power7 and -mcpu=power8.  The original code
puts the elements in a different order, and then fixes it up with a permute.  I
changed the order so that it would match how VMRGEW works, and I tested it on
both big and little endian power8's.

The original puts the values as:

        +-------+-------+-------+-------+
        | a     | unsued| b     | unused|
        +-------+-------+-------+-------+

        +-------+-------+-------+-------+
        | c     | unsued| d     | unused|
        +-------+-------+-------+-------+

The VMRGEW instruction wants the register as:

        +-------+-------+-------+-------+
        | a     | unsued| c     | unused|
        +-------+-------+-------+-------+

        +-------+-------+-------+-------+
        | b     | unsued| d     | unused|
        +-------+-------+-------+-------+

> Okay for trunk if you can clear that up.

Did that answer the question?

> Thanks,
> 
> 
> Segher
> 

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797

Reply via email to