On Mon, 12 May 2025 09:37:24 GMT, Andrew Haley <a...@openjdk.org> wrote:
>> This intrinsic is generally faster than the current implementation for >> Panama segment operations for all writes larger than about 8 bytes in size, >> increasing to more than 2* the performance on larger memory blocks on >> Graviton 2, between "panama" (C2 generated, what we use now) and "unsafe" >> (this intrinsic). >> >> >> Benchmark (aligned) (size) Mode Cnt Score >> Error Units >> MemorySegmentFillUnsafe.panama true 262143 avgt 10 7295.638 ± >> 0.422 ns/op >> MemorySegmentFillUnsafe.panama false 262143 avgt 10 8345.300 ± >> 80.161 ns/op >> MemorySegmentFillUnsafe.unsafe true 262143 avgt 10 2930.594 ± >> 0.180 ns/op >> MemorySegmentFillUnsafe.unsafe false 262143 avgt 10 3136.828 ± >> 0.232 ns/op > > Andrew Haley has updated the pull request incrementally with one additional > commit since the last revision: > > Stub stack frame src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 2619: > 2617: > 2618: __ bind(tail); > 2619: // __ add(count, count, 64); I can see why you commented this out (and prefer that to deleting it). However, a comment explaining why it is not needed might avoid maintainers being side-tracked. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25147#discussion_r2084293642