Re: RFR: 8341137: Optimize long vector multiplication using x86 VPMUL[U]DQ instruction

Jatin Bhateja Wed, 06 Nov 2024 09:40:28 -0800

On Fri, 11 Oct 2024 17:12:49 GMT, Jasmine Karthikeyan 
<jkarthike...@openjdk.org> wrote:


> > I am having a similar idea that is to group those transformations together 
> > into a `Phase` called `PhaseLowering`
> 
> I think such a phase could be quite useful in general. Recently I was trying 
> to implement the BMI1 instruction `bextr` for better performance with bit 
> masks, but ran into a problem where it doesn't have an immediate encoding so 
> we'd need to manifest a constant into a temporary register every time. With 
> an (x86-specific) ideal node, we could simply let the register allocator 
> handle placing the constant. It would also be nice to avoid needing to put 
> similar backend-specific lowerings (such as `MacroLogicV`) in shared code.

Hey @jaskarth , @merykitty ,  we already have an infrastructure where during 
parsing we create Macro Nodes which can be lowered / expanded to multiple IRs 
nodes during macro expansion, what we need in this case is a target specific IR 
pattern check since not all targets may support 32x32 multiplication with 
quadword saturation, idea is to avoid creating a new IR and piggyback needed 
information on existing MulVL IR, we already use such tricks for relaxed unsafe 
reductions. Going forward, infusion of KnownBits into our data flow analysis 
infrastructure will streamline such optimizations, this patch is performing 
point optimization for specific set of constrained multiplication patterns.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/21244#issuecomment-2411053693

Re: RFR: 8341137: Optimize long vector multiplication using x86 VPMUL[U]DQ instruction

Reply via email to