On Fri, 9 May 2025 15:39:35 GMT, Andrew Haley <a...@openjdk.org> wrote:

>> This intrinsic is generally faster than the current implementation for 
>> Panama segment operations for all writes larger than about 8 bytes in size, 
>> increasing to more than 2* the performance on larger memory blocks on 
>> Graviton 2, between "panama" (C2 generated, what we use now) and "unsafe" 
>> (this intrinsic).
>> 
>> 
>> Benchmark                       (aligned)  (size)  Mode  Cnt     Score    
>> Error  Units
>> MemorySegmentFillUnsafe.panama       true  262143  avgt   10  7295.638 ±  
>> 0.422  ns/op
>> MemorySegmentFillUnsafe.panama      false  262143  avgt   10  8345.300 ± 
>> 80.161  ns/op
>> MemorySegmentFillUnsafe.unsafe       true  262143  avgt   10  2930.594 ±  
>> 0.180  ns/op
>> MemorySegmentFillUnsafe.unsafe      false  262143  avgt   10  3136.828 ±  
>> 0.232  ns/op
>
> Andrew Haley has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   generate_unsafecopy_common_error_exit

src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 2611:

> 2609: 
> 2610:       __ subs(count, count, 64);
> 2611:       __ add(dest, dest, 64);

This add could be elided by employing a post-increment on dest in each of the 
two writes above, saving on code size. Is there a reason to prefer the add?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/25147#discussion_r2084225622

Reply via email to