[
https://issues.apache.org/jira/browse/LUCENE-7151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
David Wendt updated LUCENE-7151:
--------------------------------
Attachment: SpanScore5Bug.java
Example shows the score error. The doc0 and doc1 should score the same but only
doc1 has a non-zero score. NearSpansUnordered gives different span widths
causing the frequencies to be different and the scoring to be off.
> Nested spanNear scoring error when inner clauses overlap positions
> ------------------------------------------------------------------
>
> Key: LUCENE-7151
> URL: https://issues.apache.org/jira/browse/LUCENE-7151
> Project: Lucene - Core
> Issue Type: Bug
> Components: core/query/scoring
> Affects Versions: 5.3.1, 5.5
> Environment: Windows, Linux
> Reporter: David Wendt
> Labels: newbie
> Attachments: SpanScore5Bug.java
>
>
> For spanNear([spanNear([contents:word1, contents:word3], 2, true),
> spanNear([contents:word2, contents:word3], 2, true)], 2, false)
> Scores for the following two documents should be the same but are not.
> doc1: [----- word1 word2 ----- word2 word3 ----- word1 word2 word3 -----]
> doc2: [----- word2 word3 ----- word1 word3 ----- word1 word2 word3 -----]
> The positions of the inner clauses effect the scoring for the of the final
> 3-term phrase. This appears to be a side-effect of the span-scoring rewrite
> in 5.2(?).
> The SpansCell.adjustMax() uses end-position values to decide
> maxEndPositionCell while the SpanPositionQueue uses start-position and
> end-position values to sort the SpanCells. This means that maxEndPositionCell
> will be incorrectly set or not set depending on previous positions.
> I can provide example code illustrating the score error.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]