Re: RFR: 8338967: Improve performance for MemorySegment::fill [v10]

Per Minborg Tue, 03 Sep 2024 01:41:59 -0700

On Mon, 2 Sep 2024 09:32:56 GMT, Maurizio Cimadamore <[email protected]> 
wrote:


>> If I run:
>> 
>> 
>>     @Benchmark
>>     public long shift() {
>>         return ELEM_SIZE << 56 | ELEM_SIZE << 48 | ELEM_SIZE << 40 | 
>> ELEM_SIZE << 32 | ELEM_SIZE << 24 | ELEM_SIZE << 16 | ELEM_SIZE << 8 | 
>> ELEM_SIZE;
>>     }
>> 
>>     @Benchmark
>>     public long mul() {
>>         return ELEM_SIZE * 0xFFFF_FFFF_FFFFL;
>>     }
>> 
>> Then I get:
>> 
>> Benchmark       (ELEM_SIZE)  Mode  Cnt  Score   Error  Units
>> TestFill.mul             31  avgt   30  0.586 ? 0.045  ns/op
>> TestFill.shift           31  avgt   30  0.938 ? 0.017  ns/op
>> 
>> On my M1 machine.
>
> I found similar small improvements to be had (I wrote about them offline) 
> when replacing the bitwise-based tests (e.g. `foo & 4 != 0`) with a more 
> explicit check for `remainingBytes >=4`. Seems like bitwise operations are 
> not as optimized (or perhaps the assembly instructions for them is overall 
> more convoluted - I haven't checked).

I've tried 


final long longValue = Byte.toUnsignedLong(value) * 0x0101010101010101L;


But it had the same performance as explicit bit shifting on M1.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20712#discussion_r1741664877

Re: RFR: 8338967: Improve performance for MemorySegment::fill [v10]

Reply via email to