On Fri, 30 Aug 2024 14:15:24 GMT, Maurizio Cimadamore <[email protected]>
wrote:
>> src/java.base/share/classes/jdk/internal/foreign/AbstractMemorySegmentImpl.java
>> line 208:
>>
>>> 206: }
>>> 207: final long u = Byte.toUnsignedLong(value);
>>> 208: final long longValue = u << 56 | u << 48 | u << 40 | u <<
>>> 32 | u << 24 | u << 16 | u << 8 | u;
>>
>> this can be u * 0xFFFFFFFFFFFFL if value != 0 and just 0L if not: not sure
>> if fast(er), need to measure.
>>
>> Most of the time filling is happy with 0 since zeroing is the most common
>> case
>
>> this can be u * 0xFFFFFFFFFFFFL if value != 0 and just 0L if not: not sure
>> if fast(er), need to measure.
>>
>> Most of the time filling is happy with 0 since zeroing is the most common
>> case
>
> It's a clever trick. However, I was looking at similar tricks and found that
> the time spent here is irrelevant (e.g. I tried to always force `0` as the
> value, and couldn't see any difference).
If I run:
@Benchmark
public long shift() {
return ELEM_SIZE << 56 | ELEM_SIZE << 48 | ELEM_SIZE << 40 | ELEM_SIZE
<< 32 | ELEM_SIZE << 24 | ELEM_SIZE << 16 | ELEM_SIZE << 8 | ELEM_SIZE;
}
@Benchmark
public long mul() {
return ELEM_SIZE * 0xFFFF_FFFF_FFFFL;
}
Then I get:
Benchmark (ELEM_SIZE) Mode Cnt Score Error Units
TestFill.mul 31 avgt 30 0.586 ? 0.045 ns/op
TestFill.shift 31 avgt 30 0.938 ? 0.017 ns/op
On my M1 machine.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/20712#discussion_r1740564110