Re: [PR] Add hadamard rotation to vector fields [lucene]

via GitHub Thu, 04 Jun 2026 12:17:50 -0700


shubhamvishu commented on code in PR #16092:
URL: https://github.com/apache/lucene/pull/16092#discussion_r3358308203



##########
lucene/core/src/java/org/apache/lucene/search/KnnFloatVectorQuery.java:
##########
@@ -95,6 +98,26 @@ public KnnFloatVectorQuery(
       String field, float[] target, int k, Query filter, KnnSearchStrategy 
searchStrategy) {
     super(field, k, filter, searchStrategy);
     this.target = VectorUtil.checkFinite(Objects.requireNonNull(target, 
"target"));
+    this.targetPreRotated = false;
+  }
+
+  private boolean targetPreRotated;
+
+  @Override
+  public Query rewrite(IndexSearcher indexSearcher) throws IOException {
+    if (targetPreRotated == false) {
+      FieldInfo fi =
+          
FieldInfos.getMergedFieldInfos(indexSearcher.getIndexReader()).fieldInfo(field);
+      if (fi != null
+          && fi.getVectorDimension() == target.length
+          && 
"true".equals(fi.getAttribute(RotationAwareKnnVectorsFormat.ROTATION_ENABLED_KEY)))
 {
+        float[] rotated = new float[target.length];
+        HadamardRotation.forDimension(fi.getVectorDimension()).rotate(target, 
rotated);

Review Comment:
   @kaivalnp @benwtrent I agree with the concerns here and the challenge of 
such complicated API. I'm thinking if for starter we should keep the rotation 
per segment and keep this change restricted to rotation like a recall vs 
latency tradeoff. We could follow with more ideas on how we could clawback the 
latency. One idea would be to try and see if SIMD could help if some parts of 
rotation logic or we could think of the sort of API as mentioned which could do 
this. But doing this in this PR I feel would make this more complicated(or 
errorprone since we don't have a clear vision for such an API yet). Another 
good thing of doing it per segment would be that way the whole rotation logic 
would be retained in the KNN vector format and not spilling into Query 
implementations etc so we won't be unintentionally breaking any current use 
cases but providing user an way to improve recall at the cost of some latency 
overhead for starters and later optimize.
   
   For a first iteration, I'm wondering if we should keep the rotation at the 
segment level and scope this change specifically as a recall vs. latency 
tradeoff?. We can then follow up with ideas to claw back some of the latency 
overhead. Maybe some bits of rotation could see some acceleration with SIMD or 
otherwise we could pursue the above new API to solve the latency concerns. 
However, trying to address all of that in this PR feels like it would 
significantly increase the complexity and keeping this change focussed on 
capability addition.
   
   I also ran JMH benchmarks(below) on Graviton 3 to see how many dot 
products(byte vectors) it is equivalent to in some scenarios (it would change 
with machine vs machine or quantization etc scenarios). When compared to byte 
vectors dot product using Panama based vectorization it is roughly equivalent 
to ~50-60 dot products max and when using optimized native kernels it is 
roughly equivalent to ~200-350 dot products max. I agree might or might not be 
within the range we could swallow but looking for suggestions if this overhead 
seems larger than the potential overhead we initially thought or less than 
that? (atleast for some specific use cases [dim, segment size] the overhead 
might be less though we could easily think of a situation where the overhead 
could be large like with 1000 visited nodes(dot products) visited per segment 
300 could be <= 20-30% overhead roughly but with large segments it becomes less 
dominant and even with Panama when compared to other faster ways).
   
   ```
   Graviton 3
   
   Benchmark                                   (size)   Mode  Cnt    Score    
Error   Units
   HadamardRotationBenchmark.inverseRotate        128  thrpt   15    1.493 ±  
0.024  ops/us
   HadamardRotationBenchmark.inverseRotate        256  thrpt   15    0.708 ±  
0.013  ops/us
   HadamardRotationBenchmark.inverseRotate        702  thrpt   15    0.257 ±  
0.002  ops/us
   HadamardRotationBenchmark.inverseRotate       1024  thrpt   15    0.160 ±  
0.001  ops/us
   HadamardRotationBenchmark.inverseRotate       4096  thrpt   15    0.035 ±  
0.001  ops/us
   HadamardRotationBenchmark.rotate               128  thrpt   15    1.470 ±  
0.004  ops/us
   HadamardRotationBenchmark.rotate               256  thrpt   15    0.692 ±  
0.010  ops/us
   HadamardRotationBenchmark.rotate               702  thrpt   15    0.246 ±  
0.001  ops/us
   HadamardRotationBenchmark.rotate              1024  thrpt   15    0.155 ±  
0.001  ops/us
   HadamardRotationBenchmark.rotate              4096  thrpt   15    0.034 ±  
0.001  ops/us
   VectorUtilBenchmark.binaryDotProductScalar     128  thrpt   15   17.796 ±  
0.139  ops/us
   VectorUtilBenchmark.binaryDotProductScalar     256  thrpt   15    9.014 ±  
0.032  ops/us
   VectorUtilBenchmark.binaryDotProductScalar     702  thrpt   15    3.362 ±  
0.017  ops/us
   VectorUtilBenchmark.binaryDotProductScalar    1024  thrpt   15    2.285 ±  
0.021  ops/us
   VectorUtilBenchmark.binaryDotProductScalar    4096  thrpt   15    0.566 ±  
0.004  ops/us
   VectorUtilBenchmark.binaryDotProductVector     128  thrpt   15   60.224 ±  
0.010  ops/us
   VectorUtilBenchmark.binaryDotProductVector     256  thrpt   15   31.283 ±  
0.004  ops/us
   VectorUtilBenchmark.binaryDotProductVector     702  thrpt   15   11.609 ±  
0.005  ops/us
   VectorUtilBenchmark.binaryDotProductVector    1024  thrpt   15    8.039 ±  
0.001  ops/us
   VectorUtilBenchmark.binaryDotProductVector    4096  thrpt   15    2.024 ±  
0.001  ops/us
   VectorUtilBenchmark.dot8sNative                128  thrpt   15  135.797 ±  
2.069  ops/us
   VectorUtilBenchmark.dot8sNative                256  thrpt   15   90.647 ±  
3.913  ops/us
   VectorUtilBenchmark.dot8sNative                702  thrpt   15   49.021 ±  
3.444  ops/us
   VectorUtilBenchmark.dot8sNative               1024  thrpt   15   42.404 ±  
0.713  ops/us
   VectorUtilBenchmark.dot8sNative               4096  thrpt   15   12.222 ±  
0.227  ops/us
   ```
   
   | Dim | rotate (ops/μs) | binaryDotProductScalar (ops/μs) | 
binaryDotProductVector (ops/μs) | dot8sNative (ops/μs) |
     
|-----|-----------------|-------------------------------|-------------------------------|---------------------|
     | 128 | 1.47 | 17.8 (**12x**) | 60.2 (**41x**) | 135.8 (**92x**) |
     | 256 | 0.69 | 9.0 (**13x**) | 31.3 (**45x**) | 90.6 (**131x**) |
     | 702 | 0.25 | 3.4 (**14x**) | 11.6 (**47x**) | 49.0 (**199x**) |
     | 1024 | 0.155 | 2.3 (**15x**) | 8.0 (**52x**) | 42.4 (**274x**) |
     | 4096 | 0.034 | 0.57 (**17x**) | 2.0 (**60x**) | 12.2 (**359x**) |



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Add hadamard rotation to vector fields [lucene]

Reply via email to