Hi David, Thanks for your guidance on the proposal and for nudging me toward the core question of 
whether the Vector API is mature enough to use for this now. I spent today digging into how the JDK 
currently implements the byte array operations I had in mind, and then ran benchmarks comparing them 
with both scalar and Vector API code. What I found in the JDK: - Arrays.mismatch(byte[], byte[]) and 
the byte array hash code path delegate to ArraysSupport.mismatch / ArraysSupport.hashCode, which in 
turn use vectorizedMismatch / vectorizedHashCode plus a short scalar tail for the remaining elements. 
- The NIO buffer mismatch helpers in java.nio.BufferMismatch follow a similar pattern: they call 
ScopedMemoryAccess.vectorizedMismatch when alignment/order conditions are right, then finish with a 
scalar loop. - So for both arrays and buffers, the hot paths are already using intrinsics friendly, 
vectorized implementations rather than straight scalar loops. What I measured: In a small JMH project 
(JDK 21.0.10, JMH 1.37, 1 fork, 3 warmup + 5 measurement iterations), I compared three 
implementations of byte[] hash code: - Arrays.hashCode(byte[]) (current JDK implementation), - a 
naive scalar loop, and - a simple Vector API prototype using ByteVector.SPECIES_PREFERRED over the 
array. For array sizes 16, 64, 256, and 1024 bytes, I saw: - Arrays.hashCode was roughly 3-12x faster 
than the naive scalar loop across these sizes. - The Vector API version was basically on par with the 
scalar loop: slightly better at very small sizes, roughly equal or a bit worse at larger sizes. - At 
all sizes, the Vector API prototype was significantly slower than the existing JDK implementation. So 
for this particular operation, on this JVM and hardware, a straightforward public Vector API 
implementation does not beat the current intrinsic based implementation. My conclusion to your 
question: For the byte[] mismatch/hash family that I initially targeted, the answer seems to be: not 
yet. The JDK’s existing intrinsics and internal vectorized helpers are already much better than what 
I can achieve with a simple Vector API implementation today, so replacing them with public Vector API 
code would likely be a regression rather than an improvement. Given that, I don't intend to propose 
changes to Arrays.hashCode / Arrays.mismatch or the corresponding buffer mismatch paths. Instead, I'm 
now looking for other areas where the implementation still appears more scalar and may have more 
room, for example some of the string internals (e.g., specific UTF‑16 indexOf / comparison paths) 
that are not already wired through ArraysSupport or intrinsics. If you have any preferences or 
warnings about parts of StringLatin1 / StringUTF16 that are already slated for work, I'll make sure 
not to duplicate effort. Otherwise, I'll pick one small, well defined operation, build JMH benchmarks 
around it, and see if there's a real opportunity there before drafting a concrete change. Thanks 
again for the direction, this exercise was very helpful for finding what's already "done" 
in the JDK versus where it might still make sense to explore. Best, Arnav On Jun 6, 2026, at 2:08 PM, 
David Alayachew <[email protected]> wrote: Good to hear from you again Arnav, This is a 
much more viable proposal. A perfomance analysis with metrics to back it up is very likely to be a 
welcome change. And even if it's not the exact path taken, the information gathered is useful on its 
own. The only downside is that the Vector tools in the JDK are still in preview, and thus, may or may 
not be usable for this implementation *yet*. Nonetheless, I assure you that your thinking is in the 
right place. So, I'd start with that question -- is the Vector API mature enough that it can be used 
for this **now**, or not? On Sat, Jun 6, 2026, 1:48 PM Arnav Somaghatta < 
[email protected] > wrote: Hello, My name is Arnav Somaghatta and I am a rising 
developer who is 14 years old and is interested in contributing performance improvements to existing 
primitive array operations in the JDK, especially by introducing vectorized fast paths where 
applicable. Based on benchmarking work with small primitive arrays, I have observed that certain 
operations, such as scans and comparisons, still rely on scalar loops that may leave 
SIMD/vectorization potential unused in some cases. As a small concrete example, I ran a JMH benchmark 
comparing Arrays.mismatch(byte[], byte[]) against an equivalent naive scalar implementation across 64 
byte, 256 byte, 1024 byte, and 8192 byte arrays, each with a single mismatch at the end. On my PC, 
using JDK 21 and JMH 1.37, Arrays.mismatch runs at about 4.5 ns/op, 9.9 ns/op, 28.5 ns/op, and 207.9 
ns/op for those sizes, respectively, while the naive loop takes about 15.9 ns/op, 50.9 ns/op, 216.0 
ns/op, and 1656.7 ns/op, respectively. That is approximately a 3.5x speedup at 64 bytes, 5.1x at 256 
bytes, 7.6x at 1024 bytes, and 8.0x at 8192 bytes. This suggests that optimized implementations can 
provide substantial wins even for relatively small primitive arrays, and I would like to explore 
whether similar fast paths could be applied more broadly in core library array operations. I am not 
proposing any new public APIs. Instead, my goal is to work on improving the internal implementations 
of existing methods such as: - Arrays.mismatch(...) - primitive array equality checks - byte/char 
scan heavy operations (like String related internals) My intent is to investigate whether SIMD or 
vector based implementations, via the Vector API or intrinsics where appropriate, could provide 
meaningful performance improvements for small to medium arrays without negatively affecting 
maintainability. I would like to do the implementation work myself; however, before I begin 
prototyping, I wanted to ask whether this direction is considered viable for core libs work, and if 
there are specific implementation areas that would be most appropriate to target first. Thank you so 
much for your time. Best regards, Arnav

Reply via email to