[ 
https://issues.apache.org/jira/browse/LUCY-180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13098441#comment-13098441
 ] 

Marvin Humphrey commented on LUCY-180:
--------------------------------------

Issue description edited:

Thankfully, a closer look at ANDQuery, ORQuery, and RequiredOptionalQuery has
revealed that while they do not pass down custom boosts when compiling down to
an "only child" Matcher, the boosts have not been discarded.  Instead, the
boosts have been propagated down into all child Compiler objects during the
weighting phase.

(By multiplying e.g. ANDQuery's boost into its children during the weighting
phase, it frees ANDMatcher from the need to multiply the boost into the score
for each document.)

> ORQuery, ANDQuery, RequiredOptionalQuery optimizations affect scoring
> ---------------------------------------------------------------------
>
>                 Key: LUCY-180
>                 URL: https://issues.apache.org/jira/browse/LUCY-180
>             Project: Lucy
>          Issue Type: Bug
>    Affects Versions: 0.1.0 (incubating), 0.2.0 (incubating), 0.2.1 
> (incubating)
>            Reporter: Marvin Humphrey
>            Assignee: Marvin Humphrey
>             Fix For: 0.2.2 (incubating), 0.3.0 (incubating)
>
>
> ORQuery, ANDQuery, and RequiredOptionalQuery all have optimizations which kick
> in when only one child Query can match: they all compile down to the inner
> Matcher.
> In the case of ORQuery and RequiredOptionalQuery, this optimization can kick
> in per-segment, resulting in an ORMatcher/RequiredOptionalMatcher for some
> segments and e.g. a child TermMatcher for others.  This skews scoring because
> coord() affects the ORMatcher/RequiredOptionalMatcher, but not the TermMatcher
> -- the ORMatcher/RequiredOptionalMatcher damps the score of the matching term
> by a coord() multiplier which is typically less than 1.0, but the TermMatcher
> contributes 100% of its score.  The punchline is that two documents in
> different segments which present identical match criteria can produce
> different scores, depending on whether terms not present in the document are
> represented in the segment.
> In addition, ORQuery may compile down to a smaller ORMatcher when
> e.g. 3 out of 5 OR'd terms are present.  This skews scoring for similar
> reasons.
> To present consistent scoring across all segments, Queries should always
> compile down to the same Matcher node structore for each segment.  By the time
> you are compiling per-segment Matchers, it is too late to re-calculate the
> weighting, so you can't optimize the Matcher structure when you find that e.g.
> one of two terms doesn't exist in a given segment.
> -In addition, when compiling down to a single child Matcher, ORQuery, 
> ANDQuery-
> -and RequiredOptionalQuery all discard custom boosts.  This is solvable by-
> -moving the optimization from Compiler_Make_Matcher() up into-
> -Query_Make_Compiler().-

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to