[jira] [Commented] (LUCENE-1252) Avoid using positions when not all required terms are present

Robert Muir (JIRA) Sun, 19 Oct 2014 15:02:51 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-1252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14176451#comment-14176451
 ]


Robert Muir commented on LUCENE-1252:
-------------------------------------

I think we should keep the issue open, I know I've been thinking about this one 
a lot lately. 

The thing I see is, for it to work really nicely, BooleanQuery really needs to 
own execution of both queries and filters.

some kind of blurry proposal/plan like this:

Execute filters by BooleanQuery instead of its mini-me (FilteredQuery), e.g. as 
an additional type of BooleanClause. 

Merge Filter and Weight, in some way that makes sense, e.g. maybe just make 
Weight.scorer(LeafReaderContext context, Bits acceptDocs) a covariant-return 
override of Filter.getDocIdSet(LeafReaderContext context, Bits acceptDocs). 
Make sure any "wrappers" like ConstantScore delegate any new APIs correctly.

Add bulk methods like and/or/not to Filter such that optimized impls like 
FixedBitSet.and() can be used. Since java 7u40 these ones get autovectorized by 
hotspot and are a valid strategy. I think maybe some of these could be 
optimized by sparse bitset impls as well. 

Create an enhanced cost metric/execution API for filters. BooleanQuery needs 
this additional context to give the most efficient execution. At the least, it 
should have the information to know to do the bulk optos above, and even apply 
deletes this way if its appropriate (in lucene 5 deleted docs are a 
FixedBitSet). I would also want a way to indicate that a Filter has a 
linear-time nextDoc(). these cases (e.g. filtering by exact geographic 
distance) are horrible to support, but handling them correctly (e.g. in a final 
phase) is a lesser evil than having the API be crazy so that systems like 
solr/es can do them with hacks.

Remove stuff like FilteredQuery, BooleanFilter, etc. 

Fix LUCENE-3331 (or impl in some other way), such that "scores are not needed" 
is passed down the query execution stack. The tricky part is BQ's "execution 
plan" is currently in two places really, rewrite() and Weight.scorer(). And I 
really think it needs the freedom to be able to completely restructure queries 
for performance (across nested BQ as well). Another option is to setup internal 
infra so BooleanWeight.scorer() can do this, as it have cost() knowledge too, 
but it feels so wrong.

Finally, we should add some support for "two-phase execution" via 
DISI.getSuperSet() or some other approximation. ConjunctionScorer could both 
use (when at least one sub supports) and implement this method (when e.g. coord 
scoring prevents optimal restructing and its nested) for faster AND/filtering 
of phrase/sloppy/spans/whatever, or for any other custom query/filter that 
supports a fast approximation.




> Avoid using positions when not all required terms are present
> -------------------------------------------------------------
>
>                 Key: LUCENE-1252
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1252
>             Project: Lucene - Core
>          Issue Type: Wish
>          Components: core/search
>            Reporter: Paul Elschot
>            Priority: Minor
>              Labels: gsoc2014
>
> In the Scorers of queries with (lots of) Phrases and/or (nested) Spans, 
> currently next() and skipTo() will use position information even when other 
> parts of the query cannot match because some required terms are not present.
> This could be avoided by adding some methods to Scorer that relax the 
> postcondition of next() and skipTo() to something like "all required terms 
> are present, but no position info was checked yet", and implementing these 
> methods for Scorers that do conjunctions: BooleanScorer, PhraseScorer, and 
> SpanScorer/NearSpans.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-1252) Avoid using positions when not all required terms are present

Reply via email to