benwtrent opened a new issue, #12497:
URL: https://github.com/apache/lucene/issues/12497

   ### Description
   
   Having copy-on-write segments lends itself nicely with quantization. I 
propose we add a new "scalar" or "linear" quantization codec. This will be a 
simple quantization codec provided in addition to the existing HNSW codec. 
   
   So, why a new codec?
   
    - Most users don't know all their vector distributions ahead of time. 
Lucene's segment structure is a nice fit for evolving quantiles.
    - Can provide 4x memory reduction footprint for floats.
    - Search latencies can decrease significantly as comparing scalar bytes vs 
floats is faster (not to mention the float buffer decoding overhead disappears)
   
   What would be required for the new codec:
   
    - New settings for the user to set, the quantile considered when making 
scalar quantization (absolute min/max, 99th quantile, 90th, etc.)
    - Quantized vectors will have to be stored
    - Quantization information will be stored
    - Originally provided vectors will have to be stored. We need this on 
segment merges as percentiles could be different between segments early in the 
index lifecycle. We would expect over time that re-quantization at segment 
merge will not be required as percentiles will level out. 
   
   Index time concerns:
    - I am not sure we can create the HNSW graph until all vectors are 
quantized. Some experimentation will have to be done here. It may be that 
creating the graph in a streaming fashion and then quantizing the vectors later 
works fine.
    - Segment merges require re-quantization of raw vectors if quantiles change 
significantly. This might be tunable. If the quantiles only change by some 
small factor (`1e-5`), re-quantization wouldn't be required.
   
   What would happen at search:
   
    - For each segment, the query vector will be quantized according to the 
segment quantile info
    - Search by default would search over quantized vectors scoring 
similarities. Potentially scaling the scores to be within non-quantized ranges
   
   Some open questions:
   
    - How do we handle `byte[]` searches & vectors? 
           - We could quantize those into half-bytes. Or just reject them for 
now.
    - Do we want to provide an option to re-score over non-quantized vectors? 
Or provide a way to access them directly?
    - How do we handle scoring from similarities?
       -  Dot-product between non-quantized floats and quantized bytes are very 
different. Though, I think this is solvable (meaning scores will be within 
similar ranges for non-quantized and quantized) for this codec.
    - Would we benefit from having a “quantization limit” ? Meaning a segment 
isn’t quantized until some limit is reached? To support this, scoring would 
have to be within similar ranges as you don’t want scoring to be dramatically 
different between quantized and regular vectors.
   
   Useful resources:
   https://qdrant.tech/articles/scalar-quantization/
   https://zilliz.com/blog/scalar-quantization-and-product-quantization 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to