Re: [PR] HBASE-29889 Add XXH3 Hash Support to Bloom Filter [hbase]

via GitHub Fri, 20 Feb 2026 07:35:06 -0800


jinhyukify commented on PR #7740:
URL: https://github.com/apache/hbase/pull/7740#issuecomment-3935626270

Hello! Here is the additional context I’d like to share after re-running the
benchmarks:

Although I initially thought the JDK version might limit us, I realized that
starting from HBase 3+ we are already on JDK 17, so Java version compatibility
is no longer an issue. This made me revisit the option of adopting hash4j,
since its performance with byte-array inputs is indeed excellent.

However, the new results were quite unexpected.

I provide input through our `HashKey` interface, which exposes data 1 byte
at a time for streaming access, and in this PR I also added optimized 4-byte
and 8-byte read operations on top of it. With hash4j, this means we must use
its
[HashFunnel](https://github.com/dynatrace-oss/hash4j/blob/main/src/main/java/com/dynatrace/hash4j/hashing/HashFunnel.java)
interface instead of passing a raw byte array. While the byte-array path is
very fast, the streaming path in hash4j turned out to be extremely slower.
Profiling showed that the internal handling of streamed input is not very
efficient, and this leads to a substantial performance drop.

- Tested in here: https://github.com/jinhyukify/xxh3-benchmark/tree/hash4j
- You can check the benchmark results

Given this, I don’t think we can adopt hash4j unless we change our hashing
API to return a raw byte array, which I personally want to avoid because it
would introduce unnecessary allocations and GC pressure.

For this reason, I opened a separate PR that implements XXH3 using
Zero-Allocation-Hashing library.
https://github.com/apache/hbase/pull/7772

If maintaining our own implementation is a concern (XXH3 is indeed
non-trivial), ZAH is an alternative. However, it is still roughly 2× slower
than the implementation in this PR.

Happy to discuss further if you have any thoughts or preferences!

---

**Summary**

- I evaluated both **Zero-Allocation-Hashing** and **hash4j**. While
**hash4j** shows excellent performance with raw byte-array inputs, its
streaming path (which we must use due to the HashKey interface) is
significantly slower and therefore not feasible for our use case.
- I also opened a PR with a ZAH-based implementation, but it performs
roughly 2× slower than this PR, especially for small and medium input sizes.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] HBASE-29889 Add XXH3 Hash Support to Bloom Filter [hbase]

Reply via email to