[
https://issues.apache.org/jira/browse/LUCENE-7628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15822947#comment-15822947
]
Adrien Grand commented on LUCENE-7628:
--------------------------------------
I'm not sure about this change since it puts more pressure on the Scorer API.
For instance if scores are not needed, the Scorer does not need to know about
the matching sub clauses, so there could be optimizations based on that, but
that change requires that any Scorer be able to return the list of matching sub
scorers.
By the way, the MinShouldMatchSumScorer impl is buggy since this Scorer
advances the sub scorers lazily: for instance if {{minShouldMatch}} is 2, it
will stop advancing sub clauses as soon as 2 matching scorers are found. The
only scorers will only be advanced if {{score()}} is called. I think that could
be called by calling {{updateFreq()}} before iterating the matching scorers,
like {{score()}} does.
I know you marked this new method as experimental, so we could remove this
method if that ever becomes a problem. However, I have the feeling that
{{getChildren()}} exists for the exact same reason that you are now adding
{{getMatchingChildren()}} so could we remove {{getChildren()}} now? Sorry for
being annoying but I think it is important to keep the Scorer API small.
Could you also add javadocs that this method may only be called from scorers
created though {{Weight.scorer}} and not eg. from collectors? Otherwise there
will be issues if users try to call this API when bulk scorers are used that
pass fake scorers to collectors. We have {{ToParentBlockJoinCollector}} that
uses {{getChildren}}, and it is an issue since it means it cannot work with
BS1, which is one of the most commonly used scorers.
Could we revert this change on the 6.4 branch so that we have time to clean
this up a bit before exposing it to users?
> Add a getMatchingChildren() method to DisjunctionScorer
> -------------------------------------------------------
>
> Key: LUCENE-7628
> URL: https://issues.apache.org/jira/browse/LUCENE-7628
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Alan Woodward
> Assignee: Alan Woodward
> Priority: Minor
> Fix For: 6.4
>
> Attachments: LUCENE-7628.patch
>
>
> This one is a bit convoluted, so bear with me...
> The luwak highlighter works by rewriting queries into their Span-equivalents,
> and then running them with a special Collector. At each matching doc, the
> highlighter gathers all the Spans objects positioned on the current doc and
> collects their positions using the SpanCollection API.
> Some queries can't be translated into Spans. For those queries that generate
> Scorers with ChildScorers, like BooleanQuery, we can call .getChildren() on
> the Scorer and see if any of them are SpanScorers, and for those that aren't
> we can call .getChildren() again and recurse down. For each child scorer, we
> check that it's positioned on the current document, so non-matching
> subscorers can be skipped.
> This all works correctly *except* in the case of a DisjunctionScorer where
> one of the children is a two-phase iterator that has matched its
> approximation, but not its refinement query. A SpanScorer in this situation
> will be correctly positioned on the current document, but its Spans will be
> in an undefined state, meaning the highlighter will either collect incorrect
> hits, or it will throw an Exception and prevent hits being collected from
> other subspans.
> We've tried various ways around this (including forking SpanNearQuery and
> adding a bunch of slow position checks to it that are used only by the
> highlighting code), but it turns out that the simplest fix is to add a new
> method to DisjunctionScorer that only returns the currently matching child
> Scorers. It's a bit of a hack, and it won't be used anywhere else, but it's
> a fairly small and contained hack.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]