[ https://issues.apache.org/jira/browse/LUCENE-1252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14178374#comment-14178374 ]
Robert Muir commented on LUCENE-1252: ------------------------------------- Both of those are covered. Yes its proposed as a first-class citizen (on DISI), more first-class than your "second-class" rewrite() solution. It might be good to already look at what BQ and co do with query execution today. the fact is, rewrite() is inappropriate. the main reason rewrite() is second-class here is because scoring is *per-segment* in lucene. In trunk today this means: * filters might get executed with a completely different "plan" for different segments (conjunction versus Bits->acceptDocs). * scorers might get returned as null for some segments (e.g. term doesnt exist). BQ doesn't have special null handling in every subscorer that it uses, that would be horrible. Instead it eliminates such scorers immediately in BooleanWeight's constructor. * this like the above can impact the structure of what BQ does (e.g. A or B, if B is null, it just returns A instead of a DisjunctionScorer). * scorers have a cost() api, which tells us additional critical information for execution. I've proposed expanding this to much more (additional flags/methods) so that BQ can be a quarterback and not the entire football team. another reason rewrite() is wrong here, is that such two-phase execution is only useful for conjunction (or: conjunction-like queries such as MinShouldMatch). Otherwise, its useless. Rewriting to two queries when the query is not a conjunction will not help performance. at the low level, I can't see a rewrite() solution being efficient. Today ExactPhraseScorer is already coded as: {code} while (iterate approximation...) { confirm(); // phraseFreq > 0 } {code} so in the case, its already ready to be exposed for two-phase execution. On the other hand with rewrite(), it would either be 2 separate D&Penums, or a mess? Finally, by having this generic on DISI, this "approximation" becomes much more flexible. For example a postings format could implement it for its docsenums, if it is able to implement a cheap approximation... perhaps via skiplist-like data, or perhaps with an explicit "approximate" cache mechanism specifically for this purpose. Another example is a filter like a geographic distance filter, could return a bounding box here automatically rather than forcing the user to deal with this themselves with complex query/filter hacks. At the same time, such a filter could signal that it has a linear time nextDoc, and booleanquery can really do the best execution. A rewrite() solution really makes this hard, because the two things would then be completely separate queries. But if you look at all the cases of executing this logic, its very useful for BQ to know that A is a superset of B. > Avoid using positions when not all required terms are present > ------------------------------------------------------------- > > Key: LUCENE-1252 > URL: https://issues.apache.org/jira/browse/LUCENE-1252 > Project: Lucene - Core > Issue Type: Wish > Components: core/search > Reporter: Paul Elschot > Priority: Minor > Labels: gsoc2014 > > In the Scorers of queries with (lots of) Phrases and/or (nested) Spans, > currently next() and skipTo() will use position information even when other > parts of the query cannot match because some required terms are not present. > This could be avoided by adding some methods to Scorer that relax the > postcondition of next() and skipTo() to something like "all required terms > are present, but no position info was checked yet", and implementing these > methods for Scorers that do conjunctions: BooleanScorer, PhraseScorer, and > SpanScorer/NearSpans. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org