mccullocht commented on code in PR #16092:
URL: https://github.com/apache/lucene/pull/16092#discussion_r3278462464


##########
lucene/core/src/java/org/apache/lucene/codecs/lucene104/Lucene104ScalarQuantizedVectorsReader.java:
##########
@@ -212,7 +236,22 @@ public RandomVectorScorer getRandomVectorScorer(String 
field, float[] target) th
             fi.vectorDataOffset,
             fi.vectorDataLength,
             quantizedVectorData),
-        target);
+        scoringTarget);
+  }
+
+  private HadamardRotation rotationFor(String field, int dimension) {
+    return rotations.computeIfAbsent(
+        field,
+        f ->
+            HadamardRotation.create(
+                dimension, 
Lucene104ScalarQuantizedVectorsFormat.rotationSeed(f)));
+  }
+
+  private boolean isRotationEnabled(String field) {
+    FieldInfo info = fieldInfos.fieldInfo(field);
+    return info != null
+        && "true"
+            
.equals(info.getAttribute(Lucene104ScalarQuantizedVectorsFormat.ROTATION_ENABLED_KEY));

Review Comment:
   I haven't ever seen attributes used to enable codec features like this, at 
least it doesn't appear to be common practice in the core codecs. More 
typically this would tick VERSION_CURRENT and something (probably just the 
rotation seed?) would be encoded as part of field metadata. I don't have a good 
sense as to why maybe @mikemccand or @benwtrent has a firmer and better 
reasoned opinion.



##########
lucene/core/src/java/org/apache/lucene/codecs/lucene104/Lucene104ScalarQuantizedVectorsReader.java:
##########
@@ -197,6 +213,14 @@ public RandomVectorScorer getRandomVectorScorer(String 
field, float[] target) th
     if (fi == null) {
       return null;
     }
+    // Rotate the query vector to match the rotated stored vectors.
+    float[] scoringTarget = target;
+    if (isRotationEnabled(field) && target != null) {
+      HadamardRotation rotation = rotationFor(field, fi.dimension);
+      float[] rotated = new float[target.length];
+      rotation.rotate(target, rotated);

Review Comment:
   IIUC this rotation operation is probably fairly expensive -- at least as 
expensive as quantization but possibly even more expensive. In Lucene's segment 
structure this operation will be repeated for every _segment_ searched. In your 
tests you probably ran with everything merged down to a single segment but I'm 
interested in what this costs in a more typical multi-segment setup. A 
microbenchmark for the rotation or a full luceneutil run would be helpful here.
   
   Depending on the cost we might want to figure out a way to reuse this 
computation across segments somehow which probably requires upstream 
integration with the knn query classes.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to