benwtrent opened a new issue, #12497:
URL: https://github.com/apache/lucene/issues/12497
### Description
Having copy-on-write segments lends itself nicely with quantization. I
propose we add a new "scalar" or "linear" quantization codec. This will be a
simple quantization codec provided in addition to the existing HNSW codec.
So, why a new codec?
- Most users don't know all their vector distributions ahead of time.
Lucene's segment structure is a nice fit for evolving quantiles.
- Can provide 4x memory reduction footprint for floats.
- Search latencies can decrease significantly as comparing scalar bytes vs
floats is faster (not to mention the float buffer decoding overhead disappears)
What would be required for the new codec:
- New settings for the user to set, the quantile considered when making
scalar quantization (absolute min/max, 99th quantile, 90th, etc.)
- Quantized vectors will have to be stored
- Quantization information will be stored
- Originally provided vectors will have to be stored. We need this on
segment merges as percentiles could be different between segments early in the
index lifecycle. We would expect over time that re-quantization at segment
merge will not be required as percentiles will level out.
Index time concerns:
- I am not sure we can create the HNSW graph until all vectors are
quantized. Some experimentation will have to be done here. It may be that
creating the graph in a streaming fashion and then quantizing the vectors later
works fine.
- Segment merges require re-quantization of raw vectors if quantiles change
significantly. This might be tunable. If the quantiles only change by some
small factor (`1e-5`), re-quantization wouldn't be required.
What would happen at search:
- For each segment, the query vector will be quantized according to the
segment quantile info
- Search by default would search over quantized vectors scoring
similarities. Potentially scaling the scores to be within non-quantized ranges
Some open questions:
- How do we handle `byte[]` searches & vectors?
- We could quantize those into half-bytes. Or just reject them for
now.
- Do we want to provide an option to re-score over non-quantized vectors?
Or provide a way to access them directly?
- How do we handle scoring from similarities?
- Dot-product between non-quantized floats and quantized bytes are very
different. Though, I think this is solvable (meaning scores will be within
similar ranges for non-quantized and quantized) for this codec.
- Would we benefit from having a “quantization limit” ? Meaning a segment
isn’t quantized until some limit is reached? To support this, scoring would
have to be within similar ranges as you don’t want scoring to be dramatically
different between quantized and regular vectors.
Useful resources:
https://qdrant.tech/articles/scalar-quantization/
https://zilliz.com/blog/scalar-quantization-and-product-quantization
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]