kaivalnp commented on PR #14863:
URL: https://github.com/apache/lucene/pull/14863#issuecomment-3273523849
Sorry for the delay here!
I ran the following benchmarks on 768d Cohere vectors for all vector
similarities, with 4-bit (compressed) and 7-bit quantization. I needed to run
10k queries for reliable results (saw some variance in the default case of 1k
queries)
### `cosine`
`main`
```
recall latency(ms) netCPU avgCpuCount nDoc topK fanout maxConn
beamWidth quantized index(s) index_docs/s num_segments index_size(MB)
vec_disk(MB) vec_RAM(MB) indexType
0.544 3.103 3.102 0.999 200000 100 50 32
200 4 bits 14.45 13842.75 4 670.05
659.943 74.005 HNSW
0.505 4.499 4.497 1.000 200000 100 50 32
200 7 bits 14.03 14257.20 4 745.36
733.185 147.247 HNSW
```
This PR
```
recall latency(ms) netCPU avgCpuCount nDoc topK fanout maxConn
beamWidth quantized index(s) index_docs/s num_segments index_size(MB)
vec_disk(MB) vec_RAM(MB) indexType
0.543 2.854 2.852 0.999 200000 100 50 32
200 4 bits 14.57 13724.95 4 670.06
659.943 74.005 HNSW
0.506 3.978 3.976 0.999 200000 100 50 32
200 7 bits 13.41 14912.02 4 745.09
733.185 147.247 HNSW
```
### `dot_product`
`main`
```
recall latency(ms) netCPU avgCpuCount nDoc topK fanout maxConn
beamWidth quantized index(s) index_docs/s num_segments index_size(MB)
vec_disk(MB) vec_RAM(MB) indexType
0.528 3.522 3.520 1.000 200000 100 50 32
200 4 bits 14.03 14258.22 4 674.69
659.943 74.005 HNSW
0.881 4.303 4.301 1.000 200000 100 50 32
200 7 bits 14.41 13880.21 4 746.41
733.185 147.247 HNSW
```
This PR
```
recall latency(ms) netCPU avgCpuCount nDoc topK fanout maxConn
beamWidth quantized index(s) index_docs/s num_segments index_size(MB)
vec_disk(MB) vec_RAM(MB) indexType
0.528 3.218 3.217 1.000 200000 100 50 32
200 4 bits 13.60 14706.96 4 674.64
659.943 74.005 HNSW
0.882 3.915 3.913 1.000 200000 100 50 32
200 7 bits 15.15 13205.68 4 746.44
733.185 147.247 HNSW
```
### `euclidean`
`main`
```
recall latency(ms) netCPU avgCpuCount nDoc topK fanout maxConn
beamWidth quantized index(s) index_docs/s num_segments index_size(MB)
vec_disk(MB) vec_RAM(MB) indexType
0.550 7.581 7.579 1.000 200000 100 50 32
200 4 bits 13.09 15284.68 4 667.46
659.943 74.005 HNSW
0.936 3.938 3.937 1.000 200000 100 50 32
200 7 bits 12.88 15532.77 4 739.76
733.185 147.247 HNSW
```
This PR
```
recall latency(ms) netCPU avgCpuCount nDoc topK fanout maxConn
beamWidth quantized index(s) index_docs/s num_segments index_size(MB)
vec_disk(MB) vec_RAM(MB) indexType
0.550 2.422 2.420 0.999 200000 100 50 32
200 4 bits 13.27 15070.45 4 667.45
659.943 74.005 HNSW
0.936 3.666 3.664 0.999 200000 100 50 32
200 7 bits 12.66 15796.54 4 739.73
733.185 147.247 HNSW
```
### `mip`
`main`
```
recall latency(ms) netCPU avgCpuCount nDoc topK fanout maxConn
beamWidth quantized index(s) index_docs/s num_segments index_size(MB)
vec_disk(MB) vec_RAM(MB) indexType
0.529 3.537 3.536 1.000 200000 100 50 32
200 4 bits 14.30 13988.95 4 674.69
659.943 74.005 HNSW
0.882 4.280 4.278 1.000 200000 100 50 32
200 7 bits 14.18 14109.35 4 746.41
733.185 147.247 HNSW
```
This PR
```
recall latency(ms) netCPU avgCpuCount nDoc topK fanout maxConn
beamWidth quantized index(s) index_docs/s num_segments index_size(MB)
vec_disk(MB) vec_RAM(MB) indexType
0.529 3.332 3.330 0.999 200000 100 50 32
200 4 bits 13.89 14401.96 4 674.65
659.943 74.005 HNSW
0.882 3.876 3.874 0.999 200000 100 50 32
200 7 bits 13.87 14423.77 4 746.43
733.185 147.247 HNSW
```
The speedup vector search time for 4 bit `euclidean` (=68%) seems amazing,
because we used to decompress the bits into a `byte` and use the same
[`squareDistance`](https://github.com/apache/lucene/blob/50a4f1864ef98f48abcfdd5202bd96693ee8b098/lucene/core/src/java24/org/apache/lucene/internal/vectorization/PanamaVectorUtilSupport.java#L789-L792)
function, which did not take into account that the max value of the inputs
could be in the \[0, 15\] range, and we can make some optimizations with this
information.
We see \~10% speedup in search time for everything else, while indexing is
kind of unaffected.
Sharing JMH benchmarks (also because it checks for correctness of functions):
```
java --module-path lucene/benchmark-jmh/build/benchmarks --module
org.apache.lucene.benchmark.jmh "VectorUtilBenchmark.binaryHalfByte*" -p
size=1024
```
```
Benchmark (size)
Mode Cnt Score Error Units
VectorUtilBenchmark.binaryHalfByteDotProductBothPackedScalar 1024
thrpt 15 2.378 ± 0.001 ops/us
VectorUtilBenchmark.binaryHalfByteDotProductBothPackedVector 1024
thrpt 15 0.472 ± 0.002 ops/us
VectorUtilBenchmark.binaryHalfByteDotProductScalar 1024
thrpt 15 2.378 ± 0.002 ops/us
VectorUtilBenchmark.binaryHalfByteDotProductSinglePackedScalar 1024
thrpt 15 2.448 ± 0.005 ops/us
VectorUtilBenchmark.binaryHalfByteDotProductSinglePackedVector 1024
thrpt 15 16.180 ± 0.082 ops/us
VectorUtilBenchmark.binaryHalfByteDotProductVector 1024
thrpt 15 20.947 ± 0.045 ops/us
VectorUtilBenchmark.binaryHalfByteSquareBothPackedScalar 1024
thrpt 15 1.642 ± 0.001 ops/us
VectorUtilBenchmark.binaryHalfByteSquareBothPackedVector 1024
thrpt 15 14.142 ± 0.031 ops/us
VectorUtilBenchmark.binaryHalfByteSquareScalar 1024
thrpt 15 2.463 ± 0.003 ops/us
VectorUtilBenchmark.binaryHalfByteSquareSinglePackedScalar 1024
thrpt 15 2.022 ± 0.001 ops/us
VectorUtilBenchmark.binaryHalfByteSquareSinglePackedVector 1024
thrpt 15 16.340 ± 0.039 ops/us
VectorUtilBenchmark.binaryHalfByteSquareVector 1024
thrpt 15 18.749 ± 0.055 ops/us
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]