[jira] [Commented] (LUCENE-7580) Spans tree scoring

Paul Elschot (JIRA) Sun, 04 Dec 2016 06:54:05 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-7580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15720066#comment-15720066
 ]


Paul Elschot commented on LUCENE-7580:
--------------------------------------

In the patch, SpansTreeQuery is a wrapper for SpanQuery that uses basically the 
same scoring as the scoring for other queries.
When all term occurrences match at top level or at 0 distance the score is the 
same as
the score for a boolean OR over the terms, independently of the Similarity that 
is used.
SpansTreeScorer scores each query term matching occurrence, and it applies 
discounts for non matching terms
and for distance matches. It also uses weights of subqueries.

The matching occurrences are recorded per document in the spans tree at each 
top level match of a document.
For each match SpansTreeScorer descends the tree down to the leaf level of the 
terms of each match.
SpansDocScorer objects are used as the tree nodes, there is one for each 
supported Spans.

Each matching term occurrence is recorded with a slop factor.
At the top level this slop factor is normally 1, and for each span near nesting 
level
the slop factor at the match is multiplied into this.

The term frequency scoring from the Similarity is used per matching term 
occurrence,
and these term occurrence scores are weighted by the slop factors sorted in 
decreasing order.
The purpose of using the given slop factors in decreasing order is to provide 
scoring consistency
between span near queries that only differ in the maximum allowed slop.
This consistency requires that an extra match with a lower slop increases the 
score of the document.
I would expect scoring to be consistent this way, but I'm not 100% sure.

The non matching term occurrences get a score that is the difference of
the normal document term frequency score and the term frequency score for the 
matching terms.
This non matching score is weighted by the slop factor of a non matching 
distance.
The non matching distance is a parameter that must be provided.
This non matching distance can for example be chosen as a little larger
than the largest distance used in the span near queries that are wrapped.

SpansTreeQuery is implemented for any combination of
SpanNearQuery, SpanOrQuery, SpanTermQuery, SpanBoostQuery,
SpanNotQuery, SpanFirstQuery, SpanContainingQuery and SpanWithinQuery.

See the javadocs and the test code on how to use SpansTreeQuery.


> Spans tree scoring
> ------------------
>
>                 Key: LUCENE-7580
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7580
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/search
>    Affects Versions: master (7.0)
>            Reporter: Paul Elschot
>            Priority: Minor
>             Fix For: 6.x
>
>         Attachments: LUCENE-7580.patch
>
>
> Recurse the spans tree to compose a score based on the type of subqueries and 
> what matched



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-7580) Spans tree scoring

Reply via email to