On Fri, 16 Jan 2026 20:14:31 GMT, Srinivas Vamsi Parasa <[email protected]> 
wrote:

>> The goal of this PR is to fix the performance regression in Arrays.fill() 
>> x86 stubs caused by masked AVX stores. The fix is to replace the masked AVX 
>> stores with store instructions without masks (i.e. unmasked stores). 
>> `fill32_masked()` and `fill64_masked()` stubs are replaced with 
>> `fill32_unmasked()` and `fill64_unmasked()` respectively.
>> 
>> To speedup unmasked stores, array fills for sizes < 64 bytes are broken down 
>> into sequences of 32B, 16B, 8B, 4B, 2B and 1B stores, depending on the size.
>> 
>> 
>> ### **Performance comparison for byte array fills in a loop for 1 million 
>> times**
>> 
>> 
>> UseAVX=3   ByteArray Size | +OptimizeFill    (Masked store   stub)     
>> [secs] | -OptimizeFill   (No stub)   [secs] | --->This PR: +OptimizeFill   
>> (Unmasked store   stub)   [secs]
>> -- | -- | -- | --
>> 1 | 0.46 | 0.14 | 0.189
>> 2 | 0.46 | 0.16 | 0.191
>> 3 | 0.46 | 0.176 | 0.199
>> 4 | 0.46 | 0.244 | 0.212
>> 5 | 0.46 | 0.29 | 0.364
>> 10 | 0.46 | 0.58 | 0.354
>> 15 | 0.46 | 0.42 | 0.325
>> 16 | 0.46 | 0.46 | 0.281
>> 17 | 0.21 | 0.5 | 0.365
>> 20 | 0.21 | 0.37 | 0.326
>> 25 | 0.21 | 0.59 | 0.343
>> 31 | 0.21 | 0.53 | 0.317
>> 32 | 0.21 | 0.58 | 0.249
>> 35 | 0.5 | 0.77 | 0.303
>> 40 | 0.5 | 0.61 | 0.312
>> 45 | 0.5 | 0.52 | 0.364
>> 48 | 0.5 | 0.66 | 0.283
>> 49 | 0.22 | 0.69 | 0.367
>> 50 | 0.22 | 0.78 | 0.344
>> 55 | 0.22 | 0.67 | 0.332
>> 60 | 0.22 | 0.67 | 0.312
>> 64 | 0.22 | 0.82 | 0.253
>> 70 | 0.51 | 1.1 | 0.394
>> 80 | 0.49 | 0.89 | 0.346
>> 90 | 0.225 | 0.68 | 0.385
>> 100 | 0.54 | 1.09 | 0.364
>> 110 | 0.6 | 0.98 | 0.416
>> 120 | 0.26 | 0.75 | 0.367
>> 128 | 0.266 | 1.1 | 0.342
>
> Srinivas Vamsi Parasa has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   Update ALL of ArraysFill JMH micro

I'm expecting to see a small regression in a write-only fill, and a larger 
improvement in write+read fill, but we didn't present the data in a way that 
makes it easy to compare those two tests. So we should present the graphed data 
as a table as well. Then we can discuss how common the write+read fill case is.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/28442#issuecomment-3781378167

Reply via email to