[
https://issues.apache.org/jira/browse/LUCENE-7628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15818467#comment-15818467
]
Alan Woodward commented on LUCENE-7628:
---------------------------------------
Well, LUCENE-2878 is still open :)
Some of the luwak highlighter will probably make it back into core at some
point - I think [~dsmiley] is planning on using at least some of it in the
UnifiedHighlighter in the future. In the meantime, it's all open source - help
yourself!
https://github.com/flaxsearch/luwak/blob/master/luwak/src/main/java/uk/co/flax/luwak/matchers/HighlightingMatcher.java
> Add a getMatchingChildren() method to DisjunctionScorer
> -------------------------------------------------------
>
> Key: LUCENE-7628
> URL: https://issues.apache.org/jira/browse/LUCENE-7628
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Alan Woodward
> Assignee: Alan Woodward
> Priority: Minor
>
> This one is a bit convoluted, so bear with me...
> The luwak highlighter works by rewriting queries into their Span-equivalents,
> and then running them with a special Collector. At each matching doc, the
> highlighter gathers all the Spans objects positioned on the current doc and
> collects their positions using the SpanCollection API.
> Some queries can't be translated into Spans. For those queries that generate
> Scorers with ChildScorers, like BooleanQuery, we can call .getChildren() on
> the Scorer and see if any of them are SpanScorers, and for those that aren't
> we can call .getChildren() again and recurse down. For each child scorer, we
> check that it's positioned on the current document, so non-matching
> subscorers can be skipped.
> This all works correctly *except* in the case of a DisjunctionScorer where
> one of the children is a two-phase iterator that has matched its
> approximation, but not its refinement query. A SpanScorer in this situation
> will be correctly positioned on the current document, but its Spans will be
> in an undefined state, meaning the highlighter will either collect incorrect
> hits, or it will throw an Exception and prevent hits being collected from
> other subspans.
> We've tried various ways around this (including forking SpanNearQuery and
> adding a bunch of slow position checks to it that are used only by the
> highlighting code), but it turns out that the simplest fix is to add a new
> method to DisjunctionScorer that only returns the currently matching child
> Scorers. It's a bit of a hack, and it won't be used anywhere else, but it's
> a fairly small and contained hack.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]