[ 
https://issues.apache.org/jira/browse/HBASE-29889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18065530#comment-18065530
 ] 

JinHyuk Kim commented on HBASE-29889:
-------------------------------------

Applied the same optimization to the existing hash implementations (Jenkins, 
murmur, murmur3) so that they read 4 bytes at a time instead of assembling the 
value byte-by-byte.
Since the hash output must remain unchanged for compatibility reasons, I also 
added golden tests to verify that the resulting hash values are identical to 
the previous implementation.
 * 
[https://github.com/jinhyukify/xxh3-benchmark/blob/intle/src/test/java/hash/JenkinsGoldenTest.java]

The tests confirmed that there is no behavioral change.

This change also resulted in measurable performance improvements. The benchmark 
results are summarized below.
h1. Summary
||Length||Jenkins||*Jenkins After*||Δ%||Murmur||*Murmur 
After*||Δ%||Murmur3||*Murmur3 After*||Δ%||
|3|377M|*381M*|+1.2%|598M|*599M*|+0.3%|622M|*619M*|-0.5%|
|8|323M|*330M*|+2.2%|315M|*417M*|+32.5%|275M|*384M*|+39.6%|
|16|147M|*204M*|+39.0%|246M|*339M*|+37.7%|207M|*282M*|+35.8%|
|32|100M|*139M*|+38.5%|145M|*208M*|+43.3%|128M|*176M*|+37.2%|
|64|51M|*70M*|+38.2%|85M|*113M*|+32.4%|70M|*89M*|+28.0%|
|128|26M|*33M*|+27.7%|44M|*56M*|+28.4%|36M|*42M*|+15.9%|
|240|13M|*17M*|+33.6%|23M|*29M*|+23.8%|18M|*19M*|+8.0%|
|256|12M|*16M*|+31.4%|22M|*27M*|+22.0%|15M|*18M*|+23.6%|
|512|6M|*8M*|+26.5%|10M|*12M*|+14.9%|7M|*7M*|+2.0%|
|1024|3M|*4M*|+26.8%|5M|*5M*|+10.5%|3M|*4M*|+6.0%|
|2048|1M|*2M*|+27.3%|2M|*2M*|+6.3%|2M|*2M*|+5.5%|
|4096|709K|*934K*|+31.8%|1M|*1M*|+10.1%|802K|*833K*|+3.8%|
|16384|185K|*232K*|+25.6%|254K|*262K*|+3.0%|199K|*197K*|-0.8%|

 
h1. Jenkins Hash

!jenkins_intLE.png!
h1. Murmur Hash

!murmur2_intLE.png!

 
h1. Murmur3 Hash

!murmur3_intLE.png!

> Add XXH3 Hash Support to Bloom Filter
> -------------------------------------
>
>                 Key: HBASE-29889
>                 URL: https://issues.apache.org/jira/browse/HBASE-29889
>             Project: HBase
>          Issue Type: New Feature
>          Components: regionserver
>            Reporter: JinHyuk Kim
>            Assignee: JinHyuk Kim
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: jenkins_intLE.png, murmur2_intLE.png, murmur3_intLE.png, 
> xxh3-lib-performances.png
>
>
> h2. Summary
> Added *XXH3* as a new hashing option for the HBase Bloom Filter.
> h2. Background
> Existing hash functions used in HBase Bloom Filters(Jenkins, Murmur and 
> Murmur3) were designed years ago and do not fully leverage modern CPU 
> architectures.
> [*XXH3*|https://github.com/Cyan4973/xxHash], on the other hand, is optimized 
> for today’s CPUs with wide execution units and fast unaligned memory access, 
> resulting in significantly faster hashing performance.
> h2. What Was Done
>  * Implemented XXH3 Hashing and integrated it as an available hash type for 
> Bloom Filters.
>  * Conducted benchmark tests comparing XXH3 with existing hash algorithms.
>  ** Benchmark test code is available in 
> [jinhyukify/xxh3-benchmark.|https://github.com/jinhyukify/xxh3-benchmark]
>  * *Benchmark Results:*
>  ** 
> https://docs.google.com/document/d/1KcCLz3nnkDNgUUMpTIWOwvrY8kgpOJFKZHboRNt2mx0/edit?usp=sharing
> h2. Expected Impact
>  * *Faster Bloom filter lookups* across all Bloom types during client-side 
> read paths.
>  * *Slight improvement in Bloom filter write performance* during HFile 
> creation and compaction, thanks to the lower hashing overhead of XXH3.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to