[GitHub] [pinot] mqliang commented on a change in pull request #7622: accelerate `ByteArray.hashCode`

GitBox Fri, 22 Oct 2021 16:02:10 -0700


mqliang commented on a change in pull request #7622:
URL: https://github.com/apache/pinot/pull/7622#discussion_r734881229




##########
File path: pinot-spi/src/main/java/org/apache/pinot/spi/utils/ByteArray.java
##########
@@ -94,7 +94,23 @@ public boolean equals(Object o) {
 
   @Override
   public int hashCode() {
-    return Arrays.hashCode(_bytes);
+    int hash = 1;
+    int i = 0;
+    for (; i + 7 < _bytes.length; i += 8) {
+      hash = -1807454463 * hash
+          + 1742810335 * _bytes[i]
+          + 887503681 * _bytes[i + 1]
+          + 28629151 * _bytes[i + 2]
+          + 923521 * _bytes[i + 3]
+          + 29791 * _bytes[i + 4]
+          + 961 * _bytes[i + 5]
+          + 31 * _bytes[i + 6]
+          + _bytes[i + 7];
+    }
+    for (; i < _bytes.length; i++) {
+      hash = 31 * hash + _bytes[i];
+    }
+    return hash;

Review comment:
       Is this implementation give the exact same hash code as 
`Arrays.hashCode(_bytes)`? I am mainly concern about back-comparability: say we 
have an existing realtime use case, to ensure we partition in the exact way as 
pinot, we import pinot-spi and use the old `hashCode()` impl to partition our 
kafka topic, after this change, will things be going wrong?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [pinot] mqliang commented on a change in pull request #7622: accelerate `ByteArray.hashCode`

Reply via email to