On Fri, 5 Sep 2025 10:12:35 GMT, Aleksey Shipilev <sh...@openjdk.org> wrote:

>> Intrinsic support for `VectorMask.laneIsSet` with a **variable** input index 
>> was introduced in PR #14200, but was inadvertently broken by PR #25673. This 
>> PR restores the intrinsic functionality and adds some JTReg tests.
>> 
>> Benchmarks on Nvidia Grace machine with 128-bit SVE:
>> 
>> Benchmark                                Unit        Before          Score 
>> Error     After           Score Error     Uplift
>> microMaskLaneIsSetByte128_var        ops/ms  21702.14415     91.902159       
>> 103472.9391     36.057447       4.767867
>> microMaskLaneIsSetByte64_var ops/ms  21468.51868     107.94177       
>> 103365.6561     69.47736        4.814754
>> microMaskLaneIsSetDouble128_var      ops/ms  77489.32791     153.242699      
>> 413499.4127     311.854079      5.336211
>> microMaskLaneIsSetFloat128_var       ops/ms  41034.95204     399.421823      
>> 206840.0988     74.702234       5.040583
>> microMaskLaneIsSetFloat64_var        ops/ms  77607.40268     175.938921      
>> 413745.3001     149.716794      5.33126
>> microMaskLaneIsSetInt128_var ops/ms  41452.48893     76.143208       
>> 206845.9754     59.371129       4.989953
>> microMaskLaneIsSetInt64_var      ops/ms      77726.2542      173.180518      
>> 413427.8838     363.575023      5.319024
>> microMaskLaneIsSetLong128_var        ops/ms  77646.11218     177.496587      
>> 413403.4404     236.609314      5.3242
>> microMaskLaneIsSetShort128_var       ops/ms  21374.93265     48.13101        
>> 103417.4618     34.827021       4.838259
>> microMaskLaneIsSetShort64_var        ops/ms  41066.19395     353.320621      
>> 206801.109      106.408938      5.035799
>> 
>> 
>> Benchmarks on Intel 6444y machine with 512-bit avx3:
>> 
>> Benchmark                                Unit        Before          Score 
>> Error     After           Score Error     Uplift
>> microMaskLaneIsSetByte128_var        ops/ms  57658.45497     240.209309      
>> 211643.8406     29.214532       3.670647
>> microMaskLaneIsSetByte256_var        ops/ms  57451.68169     116.994128      
>> 211609.4652     160.48513       3.683259
>> microMaskLaneIsSetByte512_var        ops/ms  57530.22411     311.63868       
>> 199802.8084     408.144015      3.473005
>> microMaskLaneIsSetByte64_var ops/ms  57642.2672      161.406221      
>> 205252.4464     196.86852       3.560797
>> microMaskLaneIsSetDouble256_var      ops/ms  114401.3789     231.797375      
>> 361400.344      565.593984      3.159055
>> microMaskLaneIsSetDouble512_var      ops/ms  57379.27882     159.699503      
>> 211476.1138     136.980026      3.685583
>> microMaskLaneIsSetFloat128_var       ops/ms  113943.9512     141.062663      
>> 360855.3915     494.471996      3.166955
>> microMaskLaneIsSetFloat256_var       ops/ms  57682.78182     138.142053      
>> 211659.5098     30.167972       3.66937
>> microMaskLaneIsSetFloat512_var       ops/ms  57617.66405     301.748599      
>> 211246.8588     597.18949       3.666355
>> microMaskLaneIsSetInt128_var ops/ms  113914.5062     118.681382      
>> 360856.4465     555.097397      3.167783
>> microMaskLaneIsSetInt256_var ops/ms  57681.79883     112.391639      
>> 211555.6742     217.556981      3.667633
>> microMaskLaneIsSetInt512_var ops/ms  573...
>
> test/micro/org/openjdk/bench/jdk/incubator/vector/VectorExtractBenchmark.java 
> line 34:
> 
>> 32: @Warmup(iterations = 5, time = 1)
>> 33: @Measurement(iterations = 5, time = 1)
>> 34: @Fork(value = 1, jvmArgs = {"--add-modules=jdk.incubator.vector"})
> 
> Don't do 1 fork, do at least 3.

The test results show that this test is stable, so I think forking once is 
enough? We have many JMH benchmarks that fork once.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27113#discussion_r2328949227

Reply via email to