dungba88 commented on issue #14984: URL: https://github.com/apache/lucene/issues/14984#issuecomment-3111827170
A bit related, but I think for re-scoring phase, keeping the query vector at 32-bit and dot product with 1-bit/4-bit/7-bit may yield better latency recall than if we have to quantized it. From the benchmark in https://github.com/apache/lucene/pull/14009, using dot_product(32bit, 32bit) only added a very small latency, but it's much higher for (7bit, 7bit) due to the quantization cost (which would be more prominent when the the number of vectors to score are small). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org