dungba88 commented on PR #14009: URL: https://github.com/apache/lucene/pull/14009#issuecomment-2903403699
Let me rephrase it better. I'm not worrying about the computation cost, as you mentioned they are the same. In Amazon (where both Vigya and I are working at), we usually run KnnFloatVectorQuery along with keyword matching (such as TermQuery) disjunctively amongst with other constraint filtering. We don't use the scoring in the KnnFloatVectorQuery as well as TermQuery (as we will wrap them with ConstantScoreQuery), and after we get all the hits from both KNN and keyword matching we will use our own scorer to compute the final top-k score. One of the thing I'm hoping to do here is for KnnFloatVectorQuery with low-bit quantization, we can get a better recall with over-sampling and re-ranking with the full-precision vector and cut-off the hits to the original top-k. The end result is that the KnnFloatVectorQuery will still output top-k results but with better recall due to the oversample and reranking. Moreover we don't have to rely on the final scorer to remove/downrank the defects which maybe introduced with the oversampling. If we keep the full over-sample during the Query match phase an d only get the final top-k when collecting hits at the end as the alternative approach, then: - Wouldn't the `Scorer.score()` be skipped/not called at all and the RerankQuery will essentially be no-op? Asking just for my education. - It would not possible (due to the how Weight/Scorer are used) to limit the hits to a fixed K for the RerankQuery alone, before applying it conjunctively/disjunctively with other Query (similar to how AbstractKnnVectorQuery works), wouldn't it? Then it might not be possible to apply for this use case. If we allow a way to flexibly choose either to trim or not the results from RerankQuery, do you think that would be better and serve both use cases we have here? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org