On Tue, Feb 4, 2014 at 10:15 PM, Bill Schmidt
<wschm...@linux.vnet.ibm.com> wrote:
> Hi,
>
> One final patch in the series, this one for vec_sum2s.  This builtin
> requires some additional code generation for the case of little endian
> without -maltivec=be.  Here's an example:
>
>   va = {-10,1,2,3};        0x 00000003 00000002 00000001 fffffff6
>   vb = {100,101,102,-103}; 0x ffffff99 00000066 00000065 00000064
>   vc = vec_sum2s (va, vb); 0x ffffff9e 00000000 0000005c 00000000
>                               = {0,92,0,-98};
>
> We need to add -10 + 1 + 101 = 92 and place it in vc[1], and add 2 + 3 +
> -103 and place the result in vc[3], with zeroes in the other two
> elements.  To do this, we first use "vsldoi vs,vb,vb,12" to rotate 101
> and -103 into big-endian elements 1 and 3, as required by the vsum2sws
> instruction:
>
>   0x ffffff99 00000066 00000065 00000064 ffffff99 00000066 00000065 00000064
>                                 ^^^^^^^^ ^^^^^^^^ ^^^^^^^^ ^^^^^^^^
>                           vs =  00000064 ffffff99 00000066 00000065
>
> Executing "vsum2sws vs,va,vs" then gives
>
>   vs = 0x 00000000 ffffff9e 00000000 0000005c
>
> which then must be shifted into position with "vsldoi vc,vs,vs,4"
>
>   0x 00000000 ffffff9e 00000000 0000005c 00000000 ffffff9e 00000000 0000005c
>               ^^^^^^^^ ^^^^^^^^ ^^^^^^^^ ^^^^^^^^
>          vc = ffffff9e 00000000 0000005c 00000000
>
> which is the desired result.
>
> In addition to this change, I noticed a redundant test from one of my
> previous patches and simplified it.  (BYTES_BIG_ENDIAN implies
> VECTOR_ELT_ORDER_BIG, so we don't need to test BYTES_BIG_ENDIAN.)
>
> As usual, new test cases are added to cover the possible cases.  These
> are simpler this time since only vector signed integer is a legal type
> for vec_sum2s.
>
> Bootstrapped and tested on powerpc64{,le}-unknown-linux-gnu with no
> regressions.  Is this ok for trunk?
>
> Thanks,
> Bill
>
>
> gcc:
>
> 2014-02-04  Bill Schmidt  <wschm...@linux.vnet.ibm.com>
>
>         * config/rs6000/altivec.md (altivec_vsum2sws): Adjust code
>         generation for -maltivec=be.
>         (altivec_vsumsws): Simplify redundant test.
>
> gcc/testsuite:
>
> 2014-02-04  Bill Schmidt  <wschm...@linux.vnet.ibm.com>
>
>         * gcc.dg/vmx/sum2s.c: New.
>         * gcc.dg/vmx/sum2s-be-order.c: New.

Okay.

The multi-instruction sequences really should be emitted as separate
instructions and the scratch only allocated for the LE case.

Thanks, David

Reply via email to