On Fri, 9 May 2025 15:39:35 GMT, Andrew Haley <a...@openjdk.org> wrote:
>> This intrinsic is generally faster than the current implementation for >> Panama segment operations for all writes larger than about 8 bytes in size, >> increasing to more than 2* the performance on larger memory blocks on >> Graviton 2, between "panama" (C2 generated, what we use now) and "unsafe" >> (this intrinsic). >> >> >> Benchmark (aligned) (size) Mode Cnt Score >> Error Units >> MemorySegmentFillUnsafe.panama true 262143 avgt 10 7295.638 ± >> 0.422 ns/op >> MemorySegmentFillUnsafe.panama false 262143 avgt 10 8345.300 ± >> 80.161 ns/op >> MemorySegmentFillUnsafe.unsafe true 262143 avgt 10 2930.594 ± >> 0.180 ns/op >> MemorySegmentFillUnsafe.unsafe false 262143 avgt 10 3136.828 ± >> 0.232 ns/op > > Andrew Haley has updated the pull request incrementally with one additional > commit since the last revision: > > generate_unsafecopy_common_error_exit src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 2611: > 2609: > 2610: __ subs(count, count, 64); > 2611: __ add(dest, dest, 64); This add could be elided by employing a post-increment on dest in each of the two writes above, saving on code size. Is there a reason to prefer the add? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25147#discussion_r2084225622