[Bug target/119702] PPCLE: Inefficient auto-vectorization for 64-bit shifts on Power9

segher at gcc dot gnu.org via Gcc-bugs Thu, 07 Aug 2025 09:06:31 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119702


--- Comment #19 from Segher Boessenkool <segher at gcc dot gnu.org> ---
(In reply to Avinash Jayakar from comment #17)
> I looked at the slp vectorization pass that converts scalar gimple code to

"straight-line paralellisation".  Some "scalar" (whatever that means) things
are most optimally implemented using "vector" instructions.  The SLP pass is
what makes this happen.

> vectorized gimple. Analysis happens in vect_slp_analyze_bb_1 before actually
> scheduling in vect_schedule_slp. There are multiple patterns written here to
> optimize simple operations such as *2 to <<1 in vect_recog_mult_pattern. 
> I have added a pattern just for detecting left shift by one and replacing it
> by add in vect_recog_lshift_by_one_pattern. Either this can be done, or I
> can move this logic in a shift pattern (vect_recog_widen_shift_pattern or
> vect_recog_vector_vector_shift_pattern). 
> 
> This does fix the original issue, where <<1 generates 2 instructions. With
> this patch it just generates 1 add instruction when code is vectorized. But
> other cases like *2 and a = a+a, is not handled right now. 
> 
> @Segher, I had a few questions on this 

This is bugzilla, this is not twitter, "@" means nothing (and my username is
"segher", not "Segher").

> - Do you suggest moving ahead in this direction? Since here I am
> manipulating the GIMPLE, it will affect different architectures as well,
> would this be ok?

If it does something that does make sense here, it is a good addition.  For
other archs as well (although Gimple-level optimisations are so ver far away
from the eventual machine code that it is hard to talk about the machine code
there at all: you are transforming some bit of Gimple code to some nicer piece
of Gimple code!)

\> (In reply to Segher Boessenkool from comment #15)
> > Just have it recognised by a define_insn that generates an addition insn
> > when generating assembler code.  You know...  the same as always :-)
> > 
> 
> - Thank you for this suggestion, I did give this a try but ran into a few
> issues. Is there a way in define_insn to detect that one of the operand in
> rtl is dead?

Yes.

dead_or_set_p perhaps.  It all depends on context.  You can use all of DF as
well of course.

> Because we need to be sure that const_1 is not used anywhere
> further before replacing the the 2 rtl insns (splat and shift), with just 1
> (plus). I checked the define_peephole2, it provides a way to check if an
> operand is dead. Would using the peephole pass for this make sense?

Peepholes make no sense ever, hehe.  Sometimes they are the most convenient
solution though.

You are thinking about peep2_reg_dead_p?

There are better, more modern, solutions almost always :-)  Text-based
peepholes have been eradicated from most places, now peephole2 should go the
way of the dodo :-)

[Bug target/119702] PPCLE: Inefficient auto-vectorization for 64-bit shifts on Power9

Reply via email to