[
https://issues.apache.org/jira/browse/LUCENE-7398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15465301#comment-15465301
]
Christoph Goller commented on LUCENE-7398:
------------------------------------------
Paul's fix almost convinced me. Unfortunately, it does not fix the case when an
intermediate span has a longer match that reduces overall sloppyness but
overlaps with a match of a subsequent span and consequently requires advancing
the subsequent span. Here is an example
Document: w1 w2 w3 w4 w5
near/0(w1, or(w2, near/0(w2, w3, w4)), or(w5, near/0(w4, w5)))
Add the following code to the end of TestSpanCollection.testNestedNearQuery()
{code}
SpanNearQuery q234 = new SpanNearQuery(new SpanQuery[]{q2, q3, q4}, 0, true);
SpanOrQuery q2234 = new SpanOrQuery(q2, q234);
SpanTermQuery p5 = new SpanTermQuery(new Term(FIELD, "w5"));
SpanNearQuery q45 = new SpanNearQuery(new SpanQuery[]{q4, p5}, 0, true);
SpanOrQuery q455 = new SpanOrQuery(q45, p5);
SpanNearQuery q1q2234q445 = new SpanNearQuery(new SpanQuery[]{q1, q2234, q455},
0, true);
spans = q1q2234q445.createWeight(searcher, false,
1f).getSpans(searcher.getIndexReader().leaves().get(0),SpanWeight.Postings.POSITIONS);
assertEquals(0, spans.advance(0));
{code}
I think we can only fix it if we get give up lazy iteration. I don't think this
is so bad for performance. If we implement a clever caching for positions in
spans a complete backtracking would only consist of making a few additional
int-comparisons. The expensive operation is iterating over all span positions
(IO) and we do this already in advancePosition(Spans, int), aren't we.
> Nested Span Queries are buggy
> -----------------------------
>
> Key: LUCENE-7398
> URL: https://issues.apache.org/jira/browse/LUCENE-7398
> Project: Lucene - Core
> Issue Type: Bug
> Components: core/search
> Affects Versions: 5.5, 6.x
> Reporter: Christoph Goller
> Assignee: Alan Woodward
> Priority: Critical
> Attachments: LUCENE-7398-20160814.patch, LUCENE-7398.patch,
> LUCENE-7398.patch, TestSpanCollection.java
>
>
> Example for a nested SpanQuery that is not working:
> Document: Human Genome Organization , HUGO , is trying to coordinate gene
> mapping research worldwide.
> Query: spanNear([body:coordinate, spanOr([spanNear([body:gene, body:mapping],
> 0, true), body:gene]), body:research], 0, true)
> The query should match "coordinate gene mapping research" as well as
> "coordinate gene research". It does not match "coordinate gene mapping
> research" with Lucene 5.5 or 6.1, it did however match with Lucene 4.10.4. It
> probably stopped working with the changes on SpanQueries in 5.3. I will
> attach a unit test that shows the problem.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]