[ 
https://issues.apache.org/jira/browse/LUCENE-8196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16404574#comment-16404574
 ] 

Alan Woodward commented on LUCENE-8196:
---------------------------------------

I moved the approximation and reset() from IntervalIterator, and it turned out 
that ConjunctionIntervalIterator was a convenient place to put them, so I kept 
that in.  I also cleaned up the utility classes (no need to check for 
TwoPhaseIterator or BitSetIterator when we know that the leaves are always 
PostingsEnum) and moved to org.apache.lucene.search.intervals.

Payload matches can be added trivially at a later point.  Payload scoring will 
be a more interesting one, but I'm not entirely happy with the way scoring 
works at the moment anyway, I need to read around a bit more on good ways of 
scoring proximity queries in general.  Offsets may not be trivial to add, as we 
have the same problem we originally had with Spans here, in that some of the 
algorithms advance leaf intervals before returning.  Something to consider 
later on, definitely.

The PQ specialization is just lifted directly from DisjunctionDISIApproximation 
in the existing core search package, so it wasn't my conclusion :)

re extractTerms() - I don't like this method being here in the first place 
really, see my comment about scoring above.  I think let's keep it simple for 
now, you can always filter afterwards.

[~jim.ferenczi] - where can I remove null checks?  I think I'm constrained by 
having to return [-1..-1] when iterator has moved to a new doc but hasn't been 
advanced over intervals yet.

The disjunction order is to address LUCENE-7398.  If a disjunction is appearing 
within a block, then sorting for minimum intervals can miss valid matches.  The 
Vigna paper doesn't seem to discuss this case anywhere.  One possible solution 
that would work in all cases would be to add a boolean to 
IntervalsSource.getIntervals() that indicates whether or not the source is in a 
final position - if it is, then it should return minimal intervals, otherwise 
it should return wider ones.  This would only be applicable to disjunctions.

> Add IntervalQuery and IntervalsSource to expose minimum interval semantics 
> across term fields
> ---------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-8196
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8196
>             Project: Lucene - Core
>          Issue Type: New Feature
>            Reporter: Alan Woodward
>            Assignee: Alan Woodward
>            Priority: Major
>         Attachments: LUCENE-8196.patch, LUCENE-8196.patch, LUCENE-8196.patch, 
> LUCENE-8196.patch
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> This ticket proposes an alternative implementation of the SpanQuery family 
> that uses minimum-interval semantics from 
> [http://vigna.di.unimi.it/ftp/papers/EfficientAlgorithmsMinimalIntervalSemantics.pdf]
>  to implement positional queries across term-based fields.  Rather than using 
> TermQueries to construct the interval operators, as in LUCENE-2878 or the 
> current Spans implementation, we instead use a new IntervalsSource object, 
> which will produce IntervalIterators over a particular segment and field.  
> These are constructed using various static helper methods, and can then be 
> passed to a new IntervalQuery which will return documents that contain one or 
> more intervals so defined.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to