Re: RFR: 8349452: Fix performance regression for Arrays.fill() with AVX512 [v13]

Emanuel Peter Mon, 19 Jan 2026 00:16:42 -0800

On Fri, 16 Jan 2026 20:31:28 GMT, Srinivas Vamsi Parasa <[email protected]> 
wrote:


>> Srinivas Vamsi Parasa has updated the pull request incrementally with one 
>> additional commit since the last revision:
>> 
>>   Update ALL of ArraysFill JMH micro
>
> Also, we can see the benefit of using unmasked stores (this PR) instead of 
> masked vector stores (existing implementation) when we update the 
> ArraysFill.java JMH micro-benchmark to perform fill (write) followed by read 
> of the filled data as shown below using short array fill as an example:
> 
> 
> @Benchmark
>     public short testShortFill() {
>         Arrays.fill(testShortArray, (short) -1);
>         return (short) (testShortArray[0] + testShortArray[size - 1]);
>     }
> 
> 
> 
> 
> 
> ### Table shows throughput (ops/ms); **(Higher is better)** 
> Benchmark   (ops/ms)     MaxVectorSize = 32 | SIZE | +OptimizeFill     
> (Masked Store) | +OptimizeFill     (Unmasked Store - This PR) | Delta
> -- | -- | -- | -- | --
> ArraysFill.testByteFill | 1 | 175381 | 342456 | 95%
> ArraysFill.testByteFill | 10 | 175421 | 264607 | 51%
> ArraysFill.testByteFill | 20 | 175447 | 271111 | 55%
> ArraysFill.testByteFill | 30 | 175454 | 253351 | 44%
> ArraysFill.testByteFill | 40 | 162429 | 273043 | 68%
> ArraysFill.testByteFill | 50 | 162443 | 251734 | 55%
> ArraysFill.testByteFill | 60 | 162454 | 248156 | 53%
> ArraysFill.testByteFill | 70 | 156659 | 236497 | 51%
> ArraysFill.testByteFill | 80 | 175403 | 269433 | 54%
> ArraysFill.testByteFill | 90 | 175422 | 230276 | 31%
> ArraysFill.testByteFill | 100 | 168662 | 252394 | 50%
> ArraysFill.testByteFill | 110 | 146182 | 217917 | 49%
> ArraysFill.testByteFill | 120 | 168693 | 239126 | 42%
> ArraysFill.testByteFill | 130 | 162378 | 166159 | 2%
> ArraysFill.testByteFill | 140 | 156569 | 168296 | 7%
> ArraysFill.testByteFill | 150 | 151214 | 167388 | 11%
> ArraysFill.testByteFill | 160 | 156594 | 173529 | 11%
> ArraysFill.testByteFill | 170 | 156590 | 167976 | 7%
> ArraysFill.testByteFill | 180 | 156561 | 173015 | 11%
> ArraysFill.testByteFill | 190 | 156601 | 173073 | 11%
> ArraysFill.testByteFill | 200 | 168277 | 174293 | 4%
> ArraysFill.testIntFill | 1 | 175403 | 334460 | 91%
> ArraysFill.testIntFill | 10 | 162437 | 273799 | 69%
> ArraysFill.testIntFill | 20 | 156636 | 273483 | 75%
> ArraysFill.testIntFill | 30 | 162440 | 243303 | 50%
> ArraysFill.testIntFill | 40 | 156592 | 175162 | 12%
> ArraysFill.testIntFill | 50 | 156585 | 168433 | 8%
> ArraysFill.testIntFill | 60 | 151193 | 195235 | 29%
> ArraysFill.testIntFill | 70 | 141406 | 167060 | 18%
> ArraysFill.testIntFill | 80 | 141406 | 167119 | 18%
> ArraysFill.testIntFill | 90 | 141437 | 166976 | 18%
> ArraysFill.testIntFill | 100 | 168349 | 173943 | 3%
> ArraysFill.testIntFill | 110 | 132864 | 173096 | 30%
> ArraysFill.testIntFill | 120 | 128972 | 173722 | 35%
> ArraysFill....

@vamsi-parasa Thanks for the extra data!

Do I see this right? In the plots 
[here](https://github.com/openjdk/jdk/pull/28442#issuecomment-3761659799), the 
masked performance lies lower/better than unmasked performance (here we measure 
ns/ops). But in your tables 
[here](https://github.com/openjdk/jdk/pull/28442#issuecomment-3761712841) you 
are measuring ops/ms, and are getting the opposite trend: masked is slower than 
unmasked.

Can you explain the difference between the two results?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/28442#issuecomment-3767004043

Re: RFR: 8349452: Fix performance regression for Arrays.fill() with AVX512 [v13]

Reply via email to