[ https://issues.apache.org/jira/browse/LUCENE-6172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Adrien Grand updated LUCENE-6172: --------------------------------- Attachment: LUCENE-6172.patch Here is an (in-progress) patch which should give an idea of what I'm trying to do. The interesting bits are mainly in Top(Docs|Field)Collector, IndexSearcher and BooleanWeight. There is one failing lucene/facets test and a couple of failing solr tests that I still need to understand. > Improve the in-order / out-of-order collection decision process > --------------------------------------------------------------- > > Key: LUCENE-6172 > URL: https://issues.apache.org/jira/browse/LUCENE-6172 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Adrien Grand > Assignee: Adrien Grand > Priority: Minor > Fix For: 5.0, Trunk > > Attachments: LUCENE-6172.patch > > > Today the logic is the following: > - IndexSearcher looks if the weight can score out-of-order > - Depending on the value it creates the appropriate top docs/field collector > I think this has several issues: > - Only IndexSearcher can actually make the decision correctly, and it only > works for top docs/field collectors. If you want to make a multi collector in > order to have both facets and top docs, then you're clueless about whether > you should create a top docs collector that supports out-of-order collection > - It is quite fragile: you need to make sure that > Weight.scoresDocsOutOfOrder and Weight.bulkScorer agree on when they can > score out-of-order. Some queries like BooleanQuery duplicate the logic and > other queries like FilteredQuery just always return true to avoid complexity. > This is inefficient as this means that IndexSearcher will create a collector > that supports out-of-order collection while the common case actually scores > documents in order (leap frog between the query and the filter). > Instead I would like to take advantage of the new collection API to make > out-of-order scoring an implementation detail of the bulk scorers. My current > idea is as follows: > - remove Weight.scoresDocsOutOfOrder > - change Collector.getLeafCollector(LeafReaderContext) to > Collector.getLeafCollector(LeafReaderContext, boolean canScoreOutOfOrder) > This new boolean in Collector.getLeafCollector tells the collector that the > scorer supports out-of-order scoring. So by returning a leaf collector that > supports out-of-order collection, things will be faster. > The new logic would be the following. First Weights cannot tell whether they > support out-of-order scoring or not. However when a weight knows it supports > out-of-order scoring, it will pass canScoreOutOfOrder=true when getting the > leaf collector. If the returned collector accepts documents out of order, > then the weight will return an out-of order scorer. Otherwise, an in-order > scorer is returned. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org