Re: RFR: 8341137: Optimize long vector multiplication using x86 VPMULUDQ instruction [v4]

Quan Anh Mai Thu, 17 Oct 2024 22:52:06 -0700

On Fri, 18 Oct 2024 05:35:28 GMT, Vladimir Ivanov <[email protected]> wrote:


>> Jatin Bhateja has updated the pull request with a new target base due to a 
>> merge or a rebase. The pull request now contains two commits:
>> 
>>  - Review resolutions
>>  - 8341137: Optimize long vector multiplication using x86 VPMULUDQ 
>> instruction
>
> src/hotspot/share/opto/vectornode.cpp line 2122:
> 
>> 2120:     // MulL (URShift SRC1 , 32) (URShift SRC2, 32)
>> 2121:     // MulL (URShift SRC1 , 32)  ( And  SRC2,  0xFFFFFFFF)
>> 2122:     // MulL ( And  SRC1,  0xFFFFFFFF) (URShift SRC2 , 32)
> 
> I don't understand how it works... According to the documentation, 
> `VPMULDQ`/`VPMULUDQ` consume vectors of double words and produce a vector of 
> quadwords. But it looks like `SRC1`/`SRC2` are always vectors of longs 
> (quadwords). And `vmuludq_reg` in `x86.ad` just takes the immedate operands 
> and pass them into `vpmuludq` which doesn't look right...

`vpmuludq` does a long multiplication but throws away the upper bits of the 
operands, effectively does a `(x & max_juint) * (y & max_juint)`

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/21244#discussion_r1805887594

Re: RFR: 8341137: Optimize long vector multiplication using x86 VPMULUDQ instruction [v4]

Reply via email to