On Tue, 2015-09-01 at 11:01 +0200, Richard Biener wrote:
> On Mon, Aug 31, 2015 at 10:28 PM, Bill Schmidt
> <[email protected]> wrote:
> > Hi,
> >
> > The following simple test fails when attempting to convert a vector
> > shift-by-scalar into a vector shift-by-vector.
> >
> > typedef unsigned char v16ui __attribute__((vector_size(16)));
> >
> > v16ui vslb(v16ui v, unsigned char i)
> > {
> > return v << i;
> > }
> >
> > When this code is gimplified, the shift amount gets expanded to an
> > unsigned int:
> >
> > vslb (v16ui v, unsigned char i)
> > {
> > v16ui D.2300;
> > unsigned int D.2301;
> >
> > D.2301 = (unsigned int) i;
> > D.2300 = v << D.2301;
> > return D.2300;
> > }
> >
> > In expand_binop, the shift-by-scalar is converted into a shift-by-vector
> > using expand_vector_broadcast, which produces the following rtx to be
> > used to initialize a V16QI vector:
> >
> > (parallel:V16QI [
> > (subreg/s/v:SI (reg:DI 155) 0)
> > (subreg/s/v:SI (reg:DI 155) 0)
> > (subreg/s/v:SI (reg:DI 155) 0)
> > (subreg/s/v:SI (reg:DI 155) 0)
> > (subreg/s/v:SI (reg:DI 155) 0)
> > (subreg/s/v:SI (reg:DI 155) 0)
> > (subreg/s/v:SI (reg:DI 155) 0)
> > (subreg/s/v:SI (reg:DI 155) 0)
> > (subreg/s/v:SI (reg:DI 155) 0)
> > (subreg/s/v:SI (reg:DI 155) 0)
> > (subreg/s/v:SI (reg:DI 155) 0)
> > (subreg/s/v:SI (reg:DI 155) 0)
> > (subreg/s/v:SI (reg:DI 155) 0)
> > (subreg/s/v:SI (reg:DI 155) 0)
> > (subreg/s/v:SI (reg:DI 155) 0)
> > (subreg/s/v:SI (reg:DI 155) 0)
> > ])
> >
> > The back end eventually chokes trying to generate a copy of the SImode
> > expression into a QImode memory slot.
> >
> > This patch fixes this problem by ensuring that the shift amount is
> > truncated to the inner mode of the vector when necessary. I've added a
> > test case verifying correct PowerPC code generation in this case.
> >
> > Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no
> > regressions. Is this ok for trunk?
> >
> > Thanks,
> > Bill
> >
> >
> > [gcc]
> >
> > 2015-08-31 Bill Schmidt <[email protected]>
> >
> > * optabs.c (expand_binop): Don't create a broadcast vector with a
> > source element wider than the inner mode.
> >
> > [gcc/testsuite]
> >
> > 2015-08-31 Bill Schmidt <[email protected]>
> >
> > * gcc.target/powerpc/vec-shift.c: New test.
> >
> >
> > Index: gcc/optabs.c
> > ===================================================================
> > --- gcc/optabs.c (revision 227353)
> > +++ gcc/optabs.c (working copy)
> > @@ -1608,6 +1608,13 @@ expand_binop (machine_mode mode, optab binoptab, r
> >
> > if (otheroptab && optab_handler (otheroptab, mode) !=
> > CODE_FOR_nothing)
> > {
> > + /* The scalar may have been extended to be too wide. Truncate
> > + it back to the proper size to fit in the broadcast vector. */
> > + machine_mode inner_mode = GET_MODE_INNER (mode);
> > + if (GET_MODE_BITSIZE (inner_mode)
> > + < GET_MODE_BITSIZE (GET_MODE (op1)))
>
> Does that work for modeless constants? Btw, what do other targets do
> here? Do they
> also choke or do they cope with the wide operand?
Good question. This works by serendipity more than by design. Because
a constant has a mode of VOIDmode, its bitsize is 0 and the TRUNCATE
won't be generated. It would be better for me to put in an explicit
check for CONST_INT rather than relying on this, though. I'll fix that.
I am not sure what other targets do here; I can check. However, do you
think that's relevant? I'm concerned that
(parallel:V16QI [
(subreg/s/v:SI (reg:DI 155) 0)
(subreg/s/v:SI (reg:DI 155) 0)
(subreg/s/v:SI (reg:DI 155) 0)
(subreg/s/v:SI (reg:DI 155) 0)
(subreg/s/v:SI (reg:DI 155) 0)
(subreg/s/v:SI (reg:DI 155) 0)
(subreg/s/v:SI (reg:DI 155) 0)
(subreg/s/v:SI (reg:DI 155) 0)
(subreg/s/v:SI (reg:DI 155) 0)
(subreg/s/v:SI (reg:DI 155) 0)
(subreg/s/v:SI (reg:DI 155) 0)
(subreg/s/v:SI (reg:DI 155) 0)
(subreg/s/v:SI (reg:DI 155) 0)
(subreg/s/v:SI (reg:DI 155) 0)
(subreg/s/v:SI (reg:DI 155) 0)
(subreg/s/v:SI (reg:DI 155) 0)
])
is a nonsensical expression and shouldn't be produced by common code, in
my view. It seems best to make this explicitly correct. Please let me
know if that's off-base.
Thanks,
Bill
>
> > + op1 = simplify_gen_unary (TRUNCATE, inner_mode, op1,
> > + GET_MODE (op1));
> > rtx vop1 = expand_vector_broadcast (mode, op1);
> > if (vop1)
> > {
> > Index: gcc/testsuite/gcc.target/powerpc/vec-shift.c
> > ===================================================================
> > --- gcc/testsuite/gcc.target/powerpc/vec-shift.c (revision 0)
> > +++ gcc/testsuite/gcc.target/powerpc/vec-shift.c (working copy)
> > @@ -0,0 +1,20 @@
> > +/* { dg-do compile { target { powerpc*-*-* } } } */
> > +/* { dg-require-effective-target powerpc_altivec_ok } */
> > +/* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */
> > +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } {
> > "-mcpu=power7" } } */
> > +/* { dg-options "-mcpu=power7 -O2" } */
> > +
> > +/* This used to ICE. During gimplification, "i" is widened to an unsigned
> > + int. We used to fail at expand time as we tried to cram an SImode item
> > + into a QImode memory slot. This has been fixed to properly truncate the
> > + shift amount when splatting it into a vector. */
> > +
> > +typedef unsigned char v16ui __attribute__((vector_size(16)));
> > +
> > +v16ui vslb(v16ui v, unsigned char i)
> > +{
> > + return v << i;
> > +}
> > +
> > +/* { dg-final { scan-assembler "vspltb" } } */
> > +/* { dg-final { scan-assembler "vslb" } } */
> >
> >
> >
>