mccullocht commented on PR #15708:
URL: https://github.com/apache/lucene/pull/15708#issuecomment-3924607405

   This API is probably fine for the described purpose but I'm skeptical about 
how useful this will be. Recall improvements diminish pretty quickly when 
increasing the query bit rate without increasing the doc bit rate. I'm 
optimistic that we could do more to improve recall and performance without 
exposing this kind of parameter.
   
   To obey the proposed API we would need to be able to compare two vectors of 
different bit rates for any pair of bit rates up to, say, 8 bits/dim. Up to 
somewhere around 4-8 comparisons/dimension the transpose + popcount strategy 
that we employ for bit and dibit works, but once the number of comparisons 
grows larger than that it starts to become cheaper to perform a dot product, 
and how well that will work depends a lot on how the vectors are packed. The 
current 1-bit packing scheme in particular would be difficult to compare to 
other bit rate vectors because of how hard it would be to unpack into the same 
dimension order as something else. This problem also exists if you look at 
extending the doc vector with quantized residual as described in the [LVQ 
paper](https://arxiv.org/pdf/2304.04759).
   
   I have another idea that is inspired by placing statistical bounds on 
estimated distance as described in the RaBitQ paper -- the idea is that if a 
`minSimilarity` parameter was passed to `score()` the scorer might be able to 
eliminate certain candidates after examining only 1 bit of a 4 bit query 
vector. I'll file an issue for this once I have a better handle on the math.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to