On Fri, 16 Jan 2026 20:14:31 GMT, Srinivas Vamsi Parasa <[email protected]> wrote:
>> The goal of this PR is to fix the performance regression in Arrays.fill() >> x86 stubs caused by masked AVX stores. The fix is to replace the masked AVX >> stores with store instructions without masks (i.e. unmasked stores). >> `fill32_masked()` and `fill64_masked()` stubs are replaced with >> `fill32_unmasked()` and `fill64_unmasked()` respectively. >> >> To speedup unmasked stores, array fills for sizes < 64 bytes are broken down >> into sequences of 32B, 16B, 8B, 4B, 2B and 1B stores, depending on the size. >> >> >> ### **Performance comparison for byte array fills in a loop for 1 million >> times** >> >> >> UseAVX=3 ByteArray Size | +OptimizeFill (Masked store stub) >> [secs] | -OptimizeFill (No stub) [secs] | --->This PR: +OptimizeFill >> (Unmasked store stub) [secs] >> -- | -- | -- | -- >> 1 | 0.46 | 0.14 | 0.189 >> 2 | 0.46 | 0.16 | 0.191 >> 3 | 0.46 | 0.176 | 0.199 >> 4 | 0.46 | 0.244 | 0.212 >> 5 | 0.46 | 0.29 | 0.364 >> 10 | 0.46 | 0.58 | 0.354 >> 15 | 0.46 | 0.42 | 0.325 >> 16 | 0.46 | 0.46 | 0.281 >> 17 | 0.21 | 0.5 | 0.365 >> 20 | 0.21 | 0.37 | 0.326 >> 25 | 0.21 | 0.59 | 0.343 >> 31 | 0.21 | 0.53 | 0.317 >> 32 | 0.21 | 0.58 | 0.249 >> 35 | 0.5 | 0.77 | 0.303 >> 40 | 0.5 | 0.61 | 0.312 >> 45 | 0.5 | 0.52 | 0.364 >> 48 | 0.5 | 0.66 | 0.283 >> 49 | 0.22 | 0.69 | 0.367 >> 50 | 0.22 | 0.78 | 0.344 >> 55 | 0.22 | 0.67 | 0.332 >> 60 | 0.22 | 0.67 | 0.312 >> 64 | 0.22 | 0.82 | 0.253 >> 70 | 0.51 | 1.1 | 0.394 >> 80 | 0.49 | 0.89 | 0.346 >> 90 | 0.225 | 0.68 | 0.385 >> 100 | 0.54 | 1.09 | 0.364 >> 110 | 0.6 | 0.98 | 0.416 >> 120 | 0.26 | 0.75 | 0.367 >> 128 | 0.266 | 1.1 | 0.342 > > Srinivas Vamsi Parasa has updated the pull request incrementally with one > additional commit since the last revision: > > Update ALL of ArraysFill JMH micro I'm expecting to see a small regression in a write-only fill, and a larger improvement in write+read fill, but we didn't present the data in a way that makes it easy to compare those two tests. So we should present the graphed data as a table as well. Then we can discuss how common the write+read fill case is. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28442#issuecomment-3781378167
