agorlenko commented on PR #11946: URL: https://github.com/apache/lucene/pull/11946#issuecomment-1320508166
> Can you explain why you want the "find all docs with score > T"? For example, we want to give user only suitable for him/her documents. We have a custom scorer (based on ml-model, for example) which calculates a score. Next, we compare that score with the threshold to determine whether this document is suitable for the user or not. But usually that scorer too computationally complex to compute it for every document which passed filters. In order to deal with this problem we can build another model, much simpler. That new model would select candidates for the heavy model. One of the basic approaches for building that light model is knn: we have a vector (embedding) for user or users' query and we have a vector (embedding) for every document. So we just find the nearest documents and pass them to the heavy scorer. But we don't know K in that case, we know only the threshold. This threshold is defined during the development of the ranking model. Such tasks naturally arise in recommendation systems and ranking as well. > That is going to be a scary thing. What if someone asks for T==0? Then the computation and memory requirements are unbounded. The same result can be achieved by setting K = 1000...00. I think we don't add the new vulnerability here. Maybe it is worth to add a warning to the documentation (for K and for similarityThreshold). If you still think that it's a bad idea to support such functionality in Lucene, I will rewrite this PR to the post-filter case. But I think it can be useful for people who add ML-ranking in search systems based on Lucene. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org