shubhamvishu commented on code in PR #12868:
URL: https://github.com/apache/lucene/pull/12868#discussion_r1414130332


##########
lucene/codecs/src/java/org/apache/lucene/codecs/bloom/FuzzySet.java:
##########
@@ -150,9 +150,10 @@ private FuzzySet(FixedBitSet filter, int bloomSize, int 
hashCount) {
    * @return NO or MAYBE
    */
   public ContainsResult contains(BytesRef value) {
-    long hash = hashFunction.hash(value);
-    int msb = (int) (hash >>> Integer.SIZE);
-    int lsb = (int) hash;
+    long[] hash = hashFunction.hash128(value);
+
+    int msb = ((int) hash[0] >>> Integer.SIZE) >>> 1 + ((int) hash[1] >>> 
Integer.SIZE) >>> 1;
+    int lsb = ((int) hash[0]) >>> 1 + ((int) hash[1]) >>> 1;

Review Comment:
   I think operator precedence won't let this expression to do exactly what I 
initially intended to but we see better performance with this actually. We 
should change it to below(and maybe check the results again?):
   
   ```java
   int msb = (int) (hash[0] >>> (Integer.SIZE + 1)) + (int) (hash[1] >>> 
(Integer.SIZE + 1));
   int lsb = ((int) hash[0] >>> 1) + ((int) hash[1] >>> 1);
   ```
   
   UPDATE : Changing to above expression I see a performance regression of 
**2.6%** so I think its fine to keep it as is(?) or is there anything I might 
me missing.



##########
lucene/codecs/src/java/org/apache/lucene/codecs/bloom/FuzzySet.java:
##########
@@ -150,9 +150,10 @@ private FuzzySet(FixedBitSet filter, int bloomSize, int 
hashCount) {
    * @return NO or MAYBE
    */
   public ContainsResult contains(BytesRef value) {
-    long hash = hashFunction.hash(value);
-    int msb = (int) (hash >>> Integer.SIZE);
-    int lsb = (int) hash;
+    long[] hash = hashFunction.hash128(value);
+
+    int msb = ((int) hash[0] >>> Integer.SIZE) >>> 1 + ((int) hash[1] >>> 
Integer.SIZE) >>> 1;
+    int lsb = ((int) hash[0]) >>> 1 + ((int) hash[1]) >>> 1;

Review Comment:
   I think operator precedence won't let this expression to do exactly what I 
initially intended to but we see better performance with this actually. We 
should change it to below(and maybe check the results again?):
   
   ```java
   int msb = (int) (hash[0] >>> (Integer.SIZE + 1)) + (int) (hash[1] >>> 
(Integer.SIZE + 1));
   int lsb = ((int) hash[0] >>> 1) + ((int) hash[1] >>> 1);
   ```
   
   **UPDATE :** Changing to above expression I see a performance regression of 
**2.6%** so I think its fine to keep it as is(?) or is there anything I might 
me missing.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to