dungba88 commented on issue #13564:
URL: https://github.com/apache/lucene/issues/13564#issuecomment-2499978963

   I think there are still 2 issues to address:
   - Prevent quantized vectors from being swapped out: Loading full-precision 
vectors are costly and can cause the quantized vectors to be swapped out if the 
OS is under memory pressure. Maybe we can use something similar to `mlock` if 
the system supports it. But I guess it can be done by the developers instead 
having it built-in support in the re-ranking Query.
   - The latency could be better. I'm still running a thorough benchmark with 
KnnGraphTester, but preliminary results show the re-ranking step adds quite 
some latency. Maybe we can execute the re-ranking per segment in parallel, or 
apply some optimization. Another thing is that we are running the rewrite phase 
and createRewrittenQuery twice: once for the main search phase and one for the 
re-ranking phase. Not sure how much overhead it will introduce.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to