Re: RFR: 8329077: C2 SuperWord: Add MoveD2L, MoveL2D, MoveF2I, MoveI2F [v3]

Emanuel Peter Sat, 09 Aug 2025 22:26:55 -0700

On Tue, 5 Aug 2025 11:39:43 GMT, Galder Zamarreño <[email protected]> wrote:


>> I've added support to vectorize `MoveD2L`, `MoveL2D`, `MoveF2I` and 
>> `MoveI2F` nodes. The implementation follows a similar pattern to what is 
>> done with conversion (`Conv*`) nodes. The tests in 
>> `TestCompatibleUseDefTypeSize` have been updated with the new expectations.
>> 
>> Also added a JMH benchmark which measures throughput (the higher the number 
>> the better) for methods that exercise these nodes. On darwin/aarch64 it 
>> shows:
>> 
>> 
>> Benchmark                                (seed)  (size)   Mode  Cnt      
>> Base      Patch   Units   Diff
>> VectorBitConversion.doubleToLongBits          0    2048  thrpt    8  
>> 1168.782   1157.717  ops/ms    -1%
>> VectorBitConversion.doubleToRawLongBits       0    2048  thrpt    8  
>> 3999.387   7353.936  ops/ms   +83%
>> VectorBitConversion.floatToIntBits            0    2048  thrpt    8  
>> 1200.338   1188.206  ops/ms    -1%
>> VectorBitConversion.floatToRawIntBits         0    2048  thrpt    8  
>> 4058.248  14792.474  ops/ms  +264%
>> VectorBitConversion.intBitsToFloat            0    2048  thrpt    8  
>> 3050.313  14984.246  ops/ms  +391%
>> VectorBitConversion.longBitsToDouble          0    2048  thrpt    8  
>> 3022.691   7379.360  ops/ms  +144%
>> 
>> 
>> The improvements observed are a result of vectorization. The lack of 
>> vectorization in `doubleToLongBits` and `floatToIntBits` demonstrates that 
>> these changes do not affect their performance. These methods do not 
>> vectorize because of flow control.
>> 
>> I've run the tier1-3 tests on linux/aarch64 and didn't observe any 
>> regressions.
>
> Galder Zamarreño has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   Check at the very least that auto vectorization is supported

src/hotspot/share/opto/superword.cpp line 1635:

> 1633:     } else if (VectorNode::is_convert_opcode(opc)) {
> 1634:       retValue = VectorCastNode::implemented(opc, size, 
> velt_basic_type(p0->in(1)), velt_basic_type(p0));
> 1635:     } else if (VectorNode::is_reinterpret_opcode(opc)) {

How does this affect `Op_ReinterpretHF2S` that is also in 
`VectorNode::is_reinterpret_opcode`?
I'm afraid that we need to test this with hardware or Intel's SDE, to make sure 
we have it running on a VM that actually supports Float16. Otherwise these 
instructions may not be used, and hence not tested right.

@galderz Can you run the relevant tests?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26457#discussion_r2265119804

Re: RFR: 8329077: C2 SuperWord: Add MoveD2L, MoveL2D, MoveF2I, MoveI2F [v3]

Reply via email to