https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119702
--- Comment #10 from Peter Bergner <bergner at gcc dot gnu.org> --- (In reply to Segher Boessenkool from comment #4) > No, we should generate code as Peter says in #c1. Doing a shift is worse > code. Agreed. If we look at the following test case: bergner@cfarm120:~$ cat pr119702.c #include <altivec.h> vector unsigned long long add (vector unsigned long long a) { return a + a; } vector unsigned long long shift (vector unsigned long long a) { return a << 1; } vector unsigned long long mult (vector unsigned long long a) { return a * 2; } ...we get with trunk: bergner@cfarm120:~$ gcc -S -O2 -mcpu=power9 pr119702.c bergner@cfarm120:~$ cat pr119702.s add: vaddudm 2,2,2 blr shift: vspltisw 0,1 vsld 2,2,0 blr mult: mfvsrld 10,34 mfvsrd 9,34 sldi 9,9,1 sldi 10,10,1 mtvsrdd 34,9,10 blr ...when they should all generate vaddudm. That mult code is really bad! The only difference when using -mcpu=power10 is that the mult code ends up with the same code as shift. I'm not sure why the power9 mult code doesn't produce the same code as power10 mult. Segher, is this a case of needing to add a combiner pattern to translate that splat/shift into an add of itself?