[ 
https://issues.apache.org/jira/browse/LUCENE-7580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15720070#comment-15720070
 ] 

Paul Elschot commented on LUCENE-7580:
--------------------------------------

SpansTreeQuery is implemented as a wrapper in order to change the existing code 
as little as possible.
But it was necessary to take DisjunctionSpans out of SpanOrQuery.
In DisjunctionSpans there are only additions for inspection at a match,
otherwise it is the same as in the current SpanOrQuery.

Changes to the current code are mostly additions to allow inspection of matches:
- For the ordered/unordered nearspans a common superclass ConjunctionNearSpans 
is added that provides the SimScorer and a currentSlop() method.
- DisjunctionSpans allows inspection of all subspans, of the subspans at the 
current doc, and of the subspans with the first and second positions.
  SpanPositionQueue also has some additions for this.
- In the TermSpans constructor the currently unused SimScorer argument is saved 
so it can be used to score() the various term frequencies.
- In Spans a reference to a SpansDocScorer object is added to allow direct 
access by disjunctions.

The only existing state that is changed is the use of needsScores (instead of 
the current false)
for weights of subqueries of SpanOrQuery and SpanNearQuery and for the weight 
of the included subquery of SpanNotQuery.

All core tests pass with the patch applied on the master branch. Ant precommit 
also passes.

There is a correction to the javadocs of Similarity.Simscorer on the use of 
float for term frequencies.

The patch also adds a constructor for SpanOrQuery with an extra parameter 
maxDistance.
When wrapped in a SpansTreeQuery, this SpanOrQuery will provide a slop factor 
at each match
that is determined by the minimum distance between any two subspans where 
possible,
and this distance is maximized to the given maxDistance.
The class DisjunctionNearSpans and its SpansDocScorer implement this.

All score calculations are done with doubles.
Most of the additions have public/protected visibility in order to allow easy 
extension.

In case there is interest in back porting this, a patch for branch_6x can be 
made available.
The tests on branch_6x disable the coordination in BooleanQuery and they only 
use the BM25 similarity.



> Spans tree scoring
> ------------------
>
>                 Key: LUCENE-7580
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7580
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/search
>    Affects Versions: master (7.0)
>            Reporter: Paul Elschot
>            Priority: Minor
>             Fix For: 6.x
>
>         Attachments: LUCENE-7580.patch
>
>
> Recurse the spans tree to compose a score based on the type of subqueries and 
> what matched



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to