oertl opened a new pull request, #20359:
URL: https://github.com/apache/kafka/pull/20359

   This PR optimizes the Murmur2 hash computation. Instead of processing bytes 
individually, the proposed change uses a `VarHandle` to load 4 bytes from the 
given `data` array at once, leading to a significant speedup. The first commit 
extends the unit tests for better coverage. The second commit introduces a JMH 
benchmark for measuring the speedup. And the third commit actually changes the 
Murmur2 implementation.
   
   The benchmark results (executed locally on my notebook) before and after 
this change are shown below. For example, `TEST_CASE_1_64` means that the input 
size varies randomly from 1 to 64 bytes, making branch prediction difficult. 
Another example, `TEST_CASE_4_4` varies the input size from 4 to 4 bytes, thus 
effectively providing arrays with a constant length of 4 bytes, which could be 
beneficial for branch prediction. All considered test cases show significant 
speedups. 
   
   **Before:**
   ```
   Benchmark                          (testCase)  Mode  Cnt    Score   Error  
Units
   Murmur2Benchmark.hashBytes      TEST_CASE_1_4  avgt   20    5.613 ± 0.110  
us/op
   Murmur2Benchmark.hashBytes     TEST_CASE_1_16  avgt   20   10.646 ± 0.086  
us/op
   Murmur2Benchmark.hashBytes     TEST_CASE_1_64  avgt   20   24.313 ± 0.371  
us/op
   Murmur2Benchmark.hashBytes    TEST_CASE_1_256  avgt   20   77.239 ± 0.651  
us/op
   Murmur2Benchmark.hashBytes      TEST_CASE_4_4  avgt   20    7.939 ± 0.061  
us/op
   Murmur2Benchmark.hashBytes    TEST_CASE_16_16  avgt   20   15.769 ± 0.180  
us/op
   Murmur2Benchmark.hashBytes    TEST_CASE_64_64  avgt   20   40.739 ± 0.617  
us/op
   Murmur2Benchmark.hashBytes  TEST_CASE_256_256  avgt   20  141.524 ± 1.599  
us/op
   ```
   
   **After:**
   ```
   Benchmark                          (testCase)  Mode  Cnt   Score   Error  
Units
   Murmur2Benchmark.hashBytes      TEST_CASE_1_4  avgt   20   3.515 ± 0.072  
us/op
   Murmur2Benchmark.hashBytes     TEST_CASE_1_16  avgt   20   7.918 ± 0.102  
us/op
   Murmur2Benchmark.hashBytes     TEST_CASE_1_64  avgt   20  16.257 ± 0.254  
us/op
   Murmur2Benchmark.hashBytes    TEST_CASE_1_256  avgt   20  46.919 ± 0.590  
us/op
   Murmur2Benchmark.hashBytes      TEST_CASE_4_4  avgt   20   6.277 ± 0.137  
us/op
   Murmur2Benchmark.hashBytes    TEST_CASE_16_16  avgt   20   9.768 ± 0.138  
us/op
   Murmur2Benchmark.hashBytes    TEST_CASE_64_64  avgt   20  22.950 ± 0.289  
us/op
   Murmur2Benchmark.hashBytes  TEST_CASE_256_256  avgt   20  81.898 ± 1.405  
us/op
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to