shubhamvishu commented on code in PR #16092:
URL: https://github.com/apache/lucene/pull/16092#discussion_r3358308203
##########
lucene/core/src/java/org/apache/lucene/search/KnnFloatVectorQuery.java:
##########
@@ -95,6 +98,26 @@ public KnnFloatVectorQuery(
String field, float[] target, int k, Query filter, KnnSearchStrategy
searchStrategy) {
super(field, k, filter, searchStrategy);
this.target = VectorUtil.checkFinite(Objects.requireNonNull(target,
"target"));
+ this.targetPreRotated = false;
+ }
+
+ private boolean targetPreRotated;
+
+ @Override
+ public Query rewrite(IndexSearcher indexSearcher) throws IOException {
+ if (targetPreRotated == false) {
+ FieldInfo fi =
+
FieldInfos.getMergedFieldInfos(indexSearcher.getIndexReader()).fieldInfo(field);
+ if (fi != null
+ && fi.getVectorDimension() == target.length
+ &&
"true".equals(fi.getAttribute(RotationAwareKnnVectorsFormat.ROTATION_ENABLED_KEY)))
{
+ float[] rotated = new float[target.length];
+ HadamardRotation.forDimension(fi.getVectorDimension()).rotate(target,
rotated);
Review Comment:
@kaivalnp @benwtrent I agree with the concerns here and the challenge of
such complicated API. I'm thinking if for starter we should keep the rotation
per segment and keep this change restricted to rotation like a recall vs
latency tradeoff. We could follow with more ideas on how we could clawback the
latency. One idea would be to try and see if SIMD could help if some parts of
rotation logic or we could think of the sort of API as mentioned which could do
this. But doing this in this PR I feel would make this more complicated(or
errorprone since we don't have a clear vision for such an API yet). Another
good thing of doing it per segment would be that way the whole rotation logic
would be retained in the KNN vector format and not spilling into Query
implementations etc so we won't be unintentionally breaking any current use
cases but providing user an way to improve recall at the cost of some latency
overhead for starters and later optimize (keeping this change focussed on capabi
lity addition).
I also ran JMH benchmarks(below) on Graviton 3 to see how many dot
products(byte vectors) a single rotation operation is equivalent to in some
scenarios (it would change with machine vs machine or different quantization
etc scenarios). When compared to byte vectors dot product (using Panama based
vectorization), a single rotation operation is roughly equivalent to ~50-60 dot
products max and when using optimized native kernels it is roughly equivalent
to ~200-350 dot products max. I agree might or might not be within the range we
could swallow but looking for suggestions if this overhead seems larger than
the potential overhead we initially thought or less than that? (atleast for
some specific use cases [dim, segment size] the overhead might be less though
we could easily think of a situation where the overhead could be large like
with 1000 visited nodes(dot products) visited per segment 300 could be <=
20-30% overhead roughly but with large segments it becomes less dominant and e
ven with Panama when compared to other faster ways).
```
Graviton 3
Benchmark (size) Mode Cnt Score
Error Units
HadamardRotationBenchmark.inverseRotate 128 thrpt 15 1.493 ±
0.024 ops/us
HadamardRotationBenchmark.inverseRotate 256 thrpt 15 0.708 ±
0.013 ops/us
HadamardRotationBenchmark.inverseRotate 702 thrpt 15 0.257 ±
0.002 ops/us
HadamardRotationBenchmark.inverseRotate 1024 thrpt 15 0.160 ±
0.001 ops/us
HadamardRotationBenchmark.inverseRotate 4096 thrpt 15 0.035 ±
0.001 ops/us
HadamardRotationBenchmark.rotate 128 thrpt 15 1.470 ±
0.004 ops/us
HadamardRotationBenchmark.rotate 256 thrpt 15 0.692 ±
0.010 ops/us
HadamardRotationBenchmark.rotate 702 thrpt 15 0.246 ±
0.001 ops/us
HadamardRotationBenchmark.rotate 1024 thrpt 15 0.155 ±
0.001 ops/us
HadamardRotationBenchmark.rotate 4096 thrpt 15 0.034 ±
0.001 ops/us
VectorUtilBenchmark.binaryDotProductScalar 128 thrpt 15 17.796 ±
0.139 ops/us
VectorUtilBenchmark.binaryDotProductScalar 256 thrpt 15 9.014 ±
0.032 ops/us
VectorUtilBenchmark.binaryDotProductScalar 702 thrpt 15 3.362 ±
0.017 ops/us
VectorUtilBenchmark.binaryDotProductScalar 1024 thrpt 15 2.285 ±
0.021 ops/us
VectorUtilBenchmark.binaryDotProductScalar 4096 thrpt 15 0.566 ±
0.004 ops/us
VectorUtilBenchmark.binaryDotProductVector 128 thrpt 15 60.224 ±
0.010 ops/us
VectorUtilBenchmark.binaryDotProductVector 256 thrpt 15 31.283 ±
0.004 ops/us
VectorUtilBenchmark.binaryDotProductVector 702 thrpt 15 11.609 ±
0.005 ops/us
VectorUtilBenchmark.binaryDotProductVector 1024 thrpt 15 8.039 ±
0.001 ops/us
VectorUtilBenchmark.binaryDotProductVector 4096 thrpt 15 2.024 ±
0.001 ops/us
VectorUtilBenchmark.dot8sNative 128 thrpt 15 135.797 ±
2.069 ops/us
VectorUtilBenchmark.dot8sNative 256 thrpt 15 90.647 ±
3.913 ops/us
VectorUtilBenchmark.dot8sNative 702 thrpt 15 49.021 ±
3.444 ops/us
VectorUtilBenchmark.dot8sNative 1024 thrpt 15 42.404 ±
0.713 ops/us
VectorUtilBenchmark.dot8sNative 4096 thrpt 15 12.222 ±
0.227 ops/us
```
| Dim | rotate (ops/μs) | binaryDotProductScalar (ops/μs) |
binaryDotProductVector (ops/μs) | dot8sNative (ops/μs) |
|-----|-----------------|-------------------------------|-------------------------------|---------------------|
| 128 | 1.47 | 17.8 (**12x**) | 60.2 (**41x**) | 135.8 (**92x**) |
| 256 | 0.69 | 9.0 (**13x**) | 31.3 (**45x**) | 90.6 (**131x**) |
| 702 | 0.25 | 3.4 (**14x**) | 11.6 (**47x**) | 49.0 (**199x**) |
| 1024 | 0.155 | 2.3 (**15x**) | 8.0 (**52x**) | 42.4 (**274x**) |
| 4096 | 0.034 | 0.57 (**17x**) | 2.0 (**60x**) | 12.2 (**359x**) |
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]