https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119702

--- Comment #10 from Peter Bergner <bergner at gcc dot gnu.org> ---
(In reply to Segher Boessenkool from comment #4)
> No, we should generate code as Peter says in #c1.  Doing a shift is worse
> code.

Agreed.  If we look at the following test case:

bergner@cfarm120:~$ cat pr119702.c
#include <altivec.h>

vector unsigned long long
add (vector unsigned long long a)
{
  return a + a;
}
vector unsigned long long
shift (vector unsigned long long a)
{
  return a << 1;
}
vector unsigned long long
mult (vector unsigned long long a)
{
  return a * 2;
}

...we get with trunk:
bergner@cfarm120:~$ gcc -S -O2 -mcpu=power9 pr119702.c
bergner@cfarm120:~$ cat pr119702.s
add:
        vaddudm 2,2,2
        blr
shift:
        vspltisw 0,1
        vsld 2,2,0
        blr
mult:
        mfvsrld 10,34
        mfvsrd 9,34
        sldi 9,9,1
        sldi 10,10,1
        mtvsrdd 34,9,10
        blr

...when they should all generate vaddudm.  That mult code is really bad!  The
only difference when using -mcpu=power10 is that the mult code ends up with the
same code as shift.  I'm not sure why the power9 mult code doesn't produce the
same code as power10 mult.

Segher, is this a case of needing to add a combiner pattern to translate that
splat/shift into an add of itself?

Reply via email to