I think this is the same pro-rated idea from LUCENE-8681; when the documents are randomly distributed among segments, the prediction can be quite accurate. In the case of a time series index though (eg, or any index where the distribution among segments is correlated with the rank), then this approach to early termination is not directly applicable.
On Fri, Jun 7, 2019 at 4:27 AM Adrien Grand (JIRA) <[email protected]> wrote: > > > [ > https://issues.apache.org/jira/browse/LUCENE-8791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16858391#comment-16858391 > ] > > Adrien Grand commented on LUCENE-8791: > -------------------------------------- > > My concern is that we are introducing complexity for a use-case that seems to > be collecting a number of hits in the first pass that is above what I would > consider reasonable. I understand why you might want to set an executor > service on an IndexSearcher since it needs to do work over potentially > millions of hits. However, the order of magnitude of the number of documents > that I'm expecting rescoring to have to look at is more in the order of > hundreds and should run in a few milliseconds only already with a single > thread. > > That concern aside, I'm supportive of the ability of rescoring hits based on > a collector, which adds flexibility compared to what you can do with a query > rescorer. I won't veto this change, but could we make the single-threaded > constructor take a Collector rather than a CollectorManager and document that > the one that takes an ExecutorService is expert and usually doesn't help > significantly? > > bq. We distribute total number of results we are looking from matching across > segments evenly plus some static number for overhead > > This way of working would be a nice enhancement to IndexSearcher when > constructed with an ExecutorService! Do you just accept the decrease of > precision if one slice didn't collect enough hits, or do you re-run the > search on this slice with a greater number of hits? > > > Add CollectorRescorer > > --------------------- > > > > Key: LUCENE-8791 > > URL: https://issues.apache.org/jira/browse/LUCENE-8791 > > Project: Lucene - Core > > Issue Type: Improvement > > Reporter: Elbek Kamoliddinov > > Priority: Major > > Attachments: LUCENE-8791.patch, LUCENE-8791.patch, > > LUCENE-8791.patch, LUCENE-8791.patch > > > > > > This is another implementation of query rescorer api (LUCENE-5489). It adds > > rescoring functionality based on provided CollectorManager. > > > > > > -- > This message was sent by Atlassian JIRA > (v7.6.3#76005) > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
