Alan Woodward created LUCENE-7628:
-------------------------------------

             Summary: Add a getMatchingChildren() method to DisjunctionScorer
                 Key: LUCENE-7628
                 URL: https://issues.apache.org/jira/browse/LUCENE-7628
             Project: Lucene - Core
          Issue Type: Improvement
            Reporter: Alan Woodward
            Assignee: Alan Woodward
            Priority: Minor


This one is a bit convoluted, so bear with me...

The luwak highlighter works by rewriting queries into their Span-equivalents, 
and then running them with a special Collector.  At each matching doc, the 
highlighter gathers all the Spans objects positioned on the current doc and 
collects their positions using the SpanCollection API.

Some queries can't be translated into Spans.  For those queries that generate 
Scorers with ChildScorers, like BooleanQuery, we can call .getChildren() on the 
Scorer and see if any of them are SpanScorers, and for those that aren't we can 
call .getChildren() again and recurse down.  For each child scorer, we check 
that it's positioned on the current document, so non-matching subscorers can be 
skipped.

This all works correctly *except* in the case of a DisjunctionScorer where one 
of the children is a two-phase iterator that has matched its approximation, but 
not its refinement query.  A SpanScorer in this situation will be correctly 
positioned on the current document, but its Spans will be in an undefined 
state, meaning the highlighter will either collect incorrect hits, or it will 
throw an Exception and prevent hits being collected from other subspans.

We've tried various ways around this (including forking SpanNearQuery and 
adding a bunch of slow position checks to it that are used only by the 
highlighting code), but it turns out that the simplest fix is to add a new 
method to DisjunctionScorer that only returns the currently matching child 
Scorers.  It's a bit of a hack, and it won't be used anywhere else, but it's a 
fairly small and contained hack.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to