[jira] [Commented] (LUCENE-7628) Add a getMatchingChildren() method to DisjunctionScorer

Paul Elschot (JIRA) Fri, 13 Jan 2017 13:20:41 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-7628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15822361#comment-15822361
 ]


Paul Elschot commented on LUCENE-7628:
--------------------------------------

To continue about using Spans directly for this
(earlier posted on github, see 
https://github.com/flaxsearch/luwak/commit/36c91e8bdd3ab0d07578b76359d1f2a87eb53797)

Other than AND and OR in the same field, what is also still needed is dealing 
with multiple fields.
For this we need a Spans that can share its DocIdSetIterator with another Spans.

Iirc that is what LUCENE-2878 is about, so I'm finally beginning to understand 
the real point of that issue, and why it is still open.

Meanwhile we had DocIdSetIterator split off from Searcher (for speed).
How about doing something similar for Spans? I think that would leave Spans 
pretty close to the Positions of LUCENE-2787. The only change in semantics for 
Spans would be that at least one of the Spans that share a DocIdSetIterator 
should provide a real position in a document. Maybe we could have sth like 
MultiFieldSpans for that.

> Add a getMatchingChildren() method to DisjunctionScorer
> -------------------------------------------------------
>
>                 Key: LUCENE-7628
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7628
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Alan Woodward
>            Assignee: Alan Woodward
>            Priority: Minor
>         Attachments: LUCENE-7628.patch
>
>
> This one is a bit convoluted, so bear with me...
> The luwak highlighter works by rewriting queries into their Span-equivalents, 
> and then running them with a special Collector.  At each matching doc, the 
> highlighter gathers all the Spans objects positioned on the current doc and 
> collects their positions using the SpanCollection API.
> Some queries can't be translated into Spans.  For those queries that generate 
> Scorers with ChildScorers, like BooleanQuery, we can call .getChildren() on 
> the Scorer and see if any of them are SpanScorers, and for those that aren't 
> we can call .getChildren() again and recurse down.  For each child scorer, we 
> check that it's positioned on the current document, so non-matching 
> subscorers can be skipped.
> This all works correctly *except* in the case of a DisjunctionScorer where 
> one of the children is a two-phase iterator that has matched its 
> approximation, but not its refinement query.  A SpanScorer in this situation 
> will be correctly positioned on the current document, but its Spans will be 
> in an undefined state, meaning the highlighter will either collect incorrect 
> hits, or it will throw an Exception and prevent hits being collected from 
> other subspans.
> We've tried various ways around this (including forking SpanNearQuery and 
> adding a bunch of slow position checks to it that are used only by the 
> highlighting code), but it turns out that the simplest fix is to add a new 
> method to DisjunctionScorer that only returns the currently matching child 
> Scorers.  It's a bit of a hack, and it won't be used anywhere else, but it's 
> a fairly small and contained hack.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-7628) Add a getMatchingChildren() method to DisjunctionScorer

Reply via email to