[
https://issues.apache.org/jira/browse/LUCENE-7580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15720070#comment-15720070
]
Paul Elschot commented on LUCENE-7580:
--------------------------------------
SpansTreeQuery is implemented as a wrapper in order to change the existing code
as little as possible.
But it was necessary to take DisjunctionSpans out of SpanOrQuery.
In DisjunctionSpans there are only additions for inspection at a match,
otherwise it is the same as in the current SpanOrQuery.
Changes to the current code are mostly additions to allow inspection of matches:
- For the ordered/unordered nearspans a common superclass ConjunctionNearSpans
is added that provides the SimScorer and a currentSlop() method.
- DisjunctionSpans allows inspection of all subspans, of the subspans at the
current doc, and of the subspans with the first and second positions.
SpanPositionQueue also has some additions for this.
- In the TermSpans constructor the currently unused SimScorer argument is saved
so it can be used to score() the various term frequencies.
- In Spans a reference to a SpansDocScorer object is added to allow direct
access by disjunctions.
The only existing state that is changed is the use of needsScores (instead of
the current false)
for weights of subqueries of SpanOrQuery and SpanNearQuery and for the weight
of the included subquery of SpanNotQuery.
All core tests pass with the patch applied on the master branch. Ant precommit
also passes.
There is a correction to the javadocs of Similarity.Simscorer on the use of
float for term frequencies.
The patch also adds a constructor for SpanOrQuery with an extra parameter
maxDistance.
When wrapped in a SpansTreeQuery, this SpanOrQuery will provide a slop factor
at each match
that is determined by the minimum distance between any two subspans where
possible,
and this distance is maximized to the given maxDistance.
The class DisjunctionNearSpans and its SpansDocScorer implement this.
All score calculations are done with doubles.
Most of the additions have public/protected visibility in order to allow easy
extension.
In case there is interest in back porting this, a patch for branch_6x can be
made available.
The tests on branch_6x disable the coordination in BooleanQuery and they only
use the BM25 similarity.
> Spans tree scoring
> ------------------
>
> Key: LUCENE-7580
> URL: https://issues.apache.org/jira/browse/LUCENE-7580
> Project: Lucene - Core
> Issue Type: Improvement
> Components: core/search
> Affects Versions: master (7.0)
> Reporter: Paul Elschot
> Priority: Minor
> Fix For: 6.x
>
> Attachments: LUCENE-7580.patch
>
>
> Recurse the spans tree to compose a score based on the type of subqueries and
> what matched
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]