On Thu, 12 Sep 2024 11:32:22 GMT, Maurizio Cimadamore <[email protected]>
wrote:
>> src/java.base/share/classes/jdk/internal/foreign/SegmentBulkOperations.java
>> line 204:
>>
>>> 202: // This gives about 20% performance increase for large values
>>> of `length`.
>>> 203: // On non-Aarch64 architectures, the unroll code will be
>>> eliminated at compile time.
>>> 204: if (Architecture.isAARCH64() && NATIVE_THRESHOLD_MISMATCH >
>>> 64) {
>>
>> I'm a bit opposed to this - as it goes in the direction to add a lot of
>> transient complexity when in reality the underlying issue is that aarch64
>> mismatch intrinsics should be fixed. Tinkering with thresholds is
>> borderline, but still acceptable - having different implementations one per
>> platform starts to look "more wrong".
>
> In other words, I don't think the goal of this (and related) PR is "improve
> mismatch so that it blows other alternatives - like Unsafe, or array" out of
> the water - as much as it is "make sure that using MemorySegment::mismatch is
> competitive with other offerings".
Then, an interesting part of these PRs is that we have uncovered quite a lot of
issues both with our intrinsics (e.g. `fill` is not vectorized and works badly
on Windows, mismatch works poorly on aarch64) *and* missed optimization
opportunities - e.g. "short" segment loops are not optimized as tightly as they
could. But it is not the job of these PRs to fix all these issues. The main
focus remain: for small sizes it is not worth going down intrinsics-lane. Let's
stick to it (in the interest of keeping the review focused).
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/20848#discussion_r1756682363