Re: [External] : Proposal for SIMD/vectorized implementations for primitive array operations

Arnav Somaghatta Sat, 06 Jun 2026 14:58:01 -0700

Hi David, Thanks for your guidance on the proposal and for nudging me toward the core question of
whether the Vector API is mature enough to use for this now. I spent today digging into how the JDK
currently implements the byte array operations I had in mind, and then ran benchmarks comparing them
with both scalar and Vector API code. What I found in the JDK: - Arrays.mismatch(byte[], byte[]) and
the byte array hash code path delegate to ArraysSupport.mismatch / ArraysSupport.hashCode, which in
turn use vectorizedMismatch / vectorizedHashCode plus a short scalar tail for the remaining elements.
- The NIO buffer mismatch helpers in java.nio.BufferMismatch follow a similar pattern: they call
ScopedMemoryAccess.vectorizedMismatch when alignment/order conditions are right, then finish with a
scalar loop. - So for both arrays and buffers, the hot paths are already using intrinsics friendly,
vectorized implementations rather than straight scalar loops. What I measured: In a small JMH project
(JDK 21.0.10, JMH 1.37, 1 fork, 3 warmup + 5 measurement iterations), I compared three
implementations of byte[] hash code: - Arrays.hashCode(byte[]) (current JDK implementation), - a
naive scalar loop, and - a simple Vector API prototype using ByteVector.SPECIES_PREFERRED over the
array. For array sizes 16, 64, 256, and 1024 bytes, I saw: - Arrays.hashCode was roughly 3-12x faster
than the naive scalar loop across these sizes. - The Vector API version was basically on par with the
scalar loop: slightly better at very small sizes, roughly equal or a bit worse at larger sizes. - At
all sizes, the Vector API prototype was significantly slower than the existing JDK implementation. So
for this particular operation, on this JVM and hardware, a straightforward public Vector API
implementation does not beat the current intrinsic based implementation. My conclusion to your
question: For the byte[] mismatch/hash family that I initially targeted, the answer seems to be: not
yet. The JDK’s existing intrinsics and internal vectorized helpers are already much better than what
I can achieve with a simple Vector API implementation today, so replacing them with public Vector API
code would likely be a regression rather than an improvement. Given that, I don't intend to propose
changes to Arrays.hashCode / Arrays.mismatch or the corresponding buffer mismatch paths. Instead, I'm
now looking for other areas where the implementation still appears more scalar and may have more
room, for example some of the string internals (e.g., specific UTF‑16 indexOf / comparison paths)
that are not already wired through ArraysSupport or intrinsics. If you have any preferences or
warnings about parts of StringLatin1 / StringUTF16 that are already slated for work, I'll make sure
not to duplicate effort. Otherwise, I'll pick one small, well defined operation, build JMH benchmarks
around it, and see if there's a real opportunity there before drafting a concrete change. Thanks
again for the direction, this exercise was very helpful for finding what's already "done"
in the JDK versus where it might still make sense to explore. Best, Arnav On Jun 6, 2026, at 2:08 PM,
David Alayachew <[email protected]> wrote: Good to hear from you again Arnav, This is a
much more viable proposal. A perfomance analysis with metrics to back it up is very likely to be a
welcome change. And even if it's not the exact path taken, the information gathered is useful on its
own. The only downside is that the Vector tools in the JDK are still in preview, and thus, may or may
not be usable for this implementation *yet*. Nonetheless, I assure you that your thinking is in the
right place. So, I'd start with that question -- is the Vector API mature enough that it can be used
for this **now**, or not? On Sat, Jun 6, 2026, 1:48 PM Arnav Somaghatta <
[email protected] > wrote: Hello, My name is Arnav Somaghatta and I am a rising
developer who is 14 years old and is interested in contributing performance improvements to existing
primitive array operations in the JDK, especially by introducing vectorized fast paths where
applicable. Based on benchmarking work with small primitive arrays, I have observed that certain
operations, such as scans and comparisons, still rely on scalar loops that may leave
SIMD/vectorization potential unused in some cases. As a small concrete example, I ran a JMH benchmark
comparing Arrays.mismatch(byte[], byte[]) against an equivalent naive scalar implementation across 64
byte, 256 byte, 1024 byte, and 8192 byte arrays, each with a single mismatch at the end. On my PC,
using JDK 21 and JMH 1.37, Arrays.mismatch runs at about 4.5 ns/op, 9.9 ns/op, 28.5 ns/op, and 207.9
ns/op for those sizes, respectively, while the naive loop takes about 15.9 ns/op, 50.9 ns/op, 216.0
ns/op, and 1656.7 ns/op, respectively. That is approximately a 3.5x speedup at 64 bytes, 5.1x at 256
bytes, 7.6x at 1024 bytes, and 8.0x at 8192 bytes. This suggests that optimized implementations can
provide substantial wins even for relatively small primitive arrays, and I would like to explore
whether similar fast paths could be applied more broadly in core library array operations. I am not
proposing any new public APIs. Instead, my goal is to work on improving the internal implementations
of existing methods such as: - Arrays.mismatch(...) - primitive array equality checks - byte/char
scan heavy operations (like String related internals) My intent is to investigate whether SIMD or
vector based implementations, via the Vector API or intrinsics where appropriate, could provide
meaningful performance improvements for small to medium arrays without negatively affecting
maintainability. I would like to do the implementation work myself; however, before I begin
prototyping, I wanted to ask whether this direction is considered viable for core libs work, and if
there are specific implementation areas that would be most appropriate to target first. Thank you so
much for your time. Best regards, Arnav

Re: [External] : Proposal for SIMD/vectorized implementations for primitive array operations

Reply via email to