jinhyukify commented on PR #7740: URL: https://github.com/apache/hbase/pull/7740#issuecomment-3935626270
Hello! Here is the additional context I’d like to share after re-running the benchmarks: Although I initially thought the JDK version might limit us, I realized that starting from HBase 3+ we are already on JDK 17, so Java version compatibility is no longer an issue. This made me revisit the option of adopting hash4j, since its performance with byte-array inputs is indeed excellent. However, the new results were quite unexpected. I provide input through our `HashKey` interface, which exposes data 1 byte at a time for streaming access, and in this PR I also added optimized 4-byte and 8-byte read operations on top of it. With hash4j, this means we must use its [HashFunnel](https://github.com/dynatrace-oss/hash4j/blob/main/src/main/java/com/dynatrace/hash4j/hashing/HashFunnel.java) interface instead of passing a raw byte array. While the byte-array path is very fast, the streaming path in hash4j turned out to be extremely slower. Profiling showed that the internal handling of streamed input is not very efficient, and this leads to a substantial performance drop. <img width="872" height="1091" alt="스크린샷 2026-02-21 오전 12 09 51" src="https://github.com/user-attachments/assets/0407cb2c-445b-4e08-83d0-9b4dde7692f6" /> - Tested in here: https://github.com/jinhyukify/xxh3-benchmark/tree/hash4j - You can check the benchmark results Given this, I don’t think we can adopt hash4j unless we change our hashing API to return a raw byte array, which I personally want to avoid because it would introduce unnecessary allocations and GC pressure. For this reason, I opened a separate PR that implements XXH3 using Zero-Allocation-Hashing library. https://github.com/apache/hbase/pull/7772 If maintaining our own implementation is a concern (XXH3 is indeed non-trivial), ZAH is an alternative. However, it is still roughly 2× slower than the implementation in this PR. Happy to discuss further if you have any thoughts or preferences! --- **Summary** - I evaluated both **Zero-Allocation-Hashing** and **hash4j**. While **hash4j** shows excellent performance with raw byte-array inputs, its streaming path (which we must use due to the HashKey interface) is significantly slower and therefore not feasible for our use case. - I also opened a PR with a ZAH-based implementation, but it performs roughly 2× slower than this PR, especially for small and medium input sizes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
