On Fri, 18 Oct 2024 05:35:28 GMT, Vladimir Ivanov <vliva...@openjdk.org> wrote:
>> Jatin Bhateja has updated the pull request with a new target base due to a >> merge or a rebase. The pull request now contains two commits: >> >> - Review resolutions >> - 8341137: Optimize long vector multiplication using x86 VPMULUDQ >> instruction > > src/hotspot/share/opto/vectornode.cpp line 2122: > >> 2120: // MulL (URShift SRC1 , 32) (URShift SRC2, 32) >> 2121: // MulL (URShift SRC1 , 32) ( And SRC2, 0xFFFFFFFF) >> 2122: // MulL ( And SRC1, 0xFFFFFFFF) (URShift SRC2 , 32) > > I don't understand how it works... According to the documentation, > `VPMULDQ`/`VPMULUDQ` consume vectors of double words and produce a vector of > quadwords. But it looks like `SRC1`/`SRC2` are always vectors of longs > (quadwords). And `vmuludq_reg` in `x86.ad` just takes the immedate operands > and pass them into `vpmuludq` which doesn't look right... `vpmuludq` does a long multiplication but throws away the upper bits of the operands, effectively does a `(x & max_juint) * (y & max_juint)` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21244#discussion_r1805887594