shubhamvishu commented on code in PR #16092:
URL: https://github.com/apache/lucene/pull/16092#discussion_r3358308203
##########
lucene/core/src/java/org/apache/lucene/search/KnnFloatVectorQuery.java:
##########
@@ -95,6 +98,26 @@ public KnnFloatVectorQuery(
String field, float[] target, int k, Query filter, KnnSearchStrategy
searchStrategy) {
super(field, k, filter, searchStrategy);
this.target = VectorUtil.checkFinite(Objects.requireNonNull(target,
"target"));
+ this.targetPreRotated = false;
+ }
+
+ private boolean targetPreRotated;
+
+ @Override
+ public Query rewrite(IndexSearcher indexSearcher) throws IOException {
+ if (targetPreRotated == false) {
+ FieldInfo fi =
+
FieldInfos.getMergedFieldInfos(indexSearcher.getIndexReader()).fieldInfo(field);
+ if (fi != null
+ && fi.getVectorDimension() == target.length
+ &&
"true".equals(fi.getAttribute(RotationAwareKnnVectorsFormat.ROTATION_ENABLED_KEY)))
{
+ float[] rotated = new float[target.length];
+ HadamardRotation.forDimension(fi.getVectorDimension()).rotate(target,
rotated);
Review Comment:
@kaivalnp @benwtrent I agree with the concerns here and the challenge of
such complicated API. I'm thinking if for starter we should keep the rotation
per segment and keep this change restricted to rotation like a recall vs
latency tradeoff. We could follow with more ideas on how we could clawback the
latency. One idea would be to try and see if SIMD could help if some parts of
rotation logic or we could think of the sort of API as mentioned which could do
this. But doing this in this PR I feel would make this more complicated(or
errorprone since we don't have a clear vision for such an API yet). Another
good thing of doing it per segment would be that way the whole rotation logic
would be retained in the KNN vector format and not spilling into Query
implementations etc so we won't be unintentionally breaking any current use
cases but providing user an way to improve recall at the cost of some latency
overhead for starters and later optimize.
For a first iteration, I'm wondering if we should keep the rotation at the
segment level and scope this change specifically as a recall vs. latency
tradeoff?. We can then follow up with ideas to claw back some of the latency
overhead. Maybe some bits of rotation could see some acceleration with SIMD or
otherwise we could pursue the above new API to solve the latency concerns.
However, trying to address all of that in this PR feels like it would
significantly increase the complexity and keeping this change focussed on
capability addition.
I also ran JMH benchmarks(below) on Graviton 3 to see how many dot
products(byte vectors) it is equivalent to in some scenarios (it would change
with machine vs machine or quantization etc scenarios). When compared to byte
vectors dot product using Panama based vectorization it is roughly equivalent
to ~50-60 dot products max and when using optimized native kernels it is
roughly equivalent to ~200-350 dot products max. I agree might or might not be
within the range we could swallow but looking for suggestions if this overhead
seems larger than the potential overhead we initially thought or less than
that? (atleast for some specific use cases [dim, segment size] the overhead
might be less though we could easily think of a situation where the overhead
could be large like with 1000 visited nodes(dot products) visited per segment
300 could be <= 20-30% overhead roughly but with large segments it becomes less
dominant and even with Panama when compared to other faster ways).
```
Graviton 3
Benchmark (size) Mode Cnt Score
Error Units
HadamardRotationBenchmark.inverseRotate 128 thrpt 15 1.493 ±
0.024 ops/us
HadamardRotationBenchmark.inverseRotate 256 thrpt 15 0.708 ±
0.013 ops/us
HadamardRotationBenchmark.inverseRotate 702 thrpt 15 0.257 ±
0.002 ops/us
HadamardRotationBenchmark.inverseRotate 1024 thrpt 15 0.160 ±
0.001 ops/us
HadamardRotationBenchmark.inverseRotate 4096 thrpt 15 0.035 ±
0.001 ops/us
HadamardRotationBenchmark.rotate 128 thrpt 15 1.470 ±
0.004 ops/us
HadamardRotationBenchmark.rotate 256 thrpt 15 0.692 ±
0.010 ops/us
HadamardRotationBenchmark.rotate 702 thrpt 15 0.246 ±
0.001 ops/us
HadamardRotationBenchmark.rotate 1024 thrpt 15 0.155 ±
0.001 ops/us
HadamardRotationBenchmark.rotate 4096 thrpt 15 0.034 ±
0.001 ops/us
VectorUtilBenchmark.binaryDotProductScalar 128 thrpt 15 17.796 ±
0.139 ops/us
VectorUtilBenchmark.binaryDotProductScalar 256 thrpt 15 9.014 ±
0.032 ops/us
VectorUtilBenchmark.binaryDotProductScalar 702 thrpt 15 3.362 ±
0.017 ops/us
VectorUtilBenchmark.binaryDotProductScalar 1024 thrpt 15 2.285 ±
0.021 ops/us
VectorUtilBenchmark.binaryDotProductScalar 4096 thrpt 15 0.566 ±
0.004 ops/us
VectorUtilBenchmark.binaryDotProductVector 128 thrpt 15 60.224 ±
0.010 ops/us
VectorUtilBenchmark.binaryDotProductVector 256 thrpt 15 31.283 ±
0.004 ops/us
VectorUtilBenchmark.binaryDotProductVector 702 thrpt 15 11.609 ±
0.005 ops/us
VectorUtilBenchmark.binaryDotProductVector 1024 thrpt 15 8.039 ±
0.001 ops/us
VectorUtilBenchmark.binaryDotProductVector 4096 thrpt 15 2.024 ±
0.001 ops/us
VectorUtilBenchmark.dot8sNative 128 thrpt 15 135.797 ±
2.069 ops/us
VectorUtilBenchmark.dot8sNative 256 thrpt 15 90.647 ±
3.913 ops/us
VectorUtilBenchmark.dot8sNative 702 thrpt 15 49.021 ±
3.444 ops/us
VectorUtilBenchmark.dot8sNative 1024 thrpt 15 42.404 ±
0.713 ops/us
VectorUtilBenchmark.dot8sNative 4096 thrpt 15 12.222 ±
0.227 ops/us
```
| Dim | rotate (ops/μs) | binaryDotProductScalar (ops/μs) |
binaryDotProductVector (ops/μs) | dot8sNative (ops/μs) |
|-----|-----------------|-------------------------------|-------------------------------|---------------------|
| 128 | 1.47 | 17.8 (**12x**) | 60.2 (**41x**) | 135.8 (**92x**) |
| 256 | 0.69 | 9.0 (**13x**) | 31.3 (**45x**) | 90.6 (**131x**) |
| 702 | 0.25 | 3.4 (**14x**) | 11.6 (**47x**) | 49.0 (**199x**) |
| 1024 | 0.155 | 2.3 (**15x**) | 8.0 (**52x**) | 42.4 (**274x**) |
| 4096 | 0.034 | 0.57 (**17x**) | 2.0 (**60x**) | 12.2 (**359x**) |
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]