[ 
https://issues.apache.org/jira/browse/LUCENE-7398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15465301#comment-15465301
 ] 

Christoph Goller commented on LUCENE-7398:
------------------------------------------

Paul's fix almost convinced me. Unfortunately, it does not fix the case when an 
intermediate span has a longer match that reduces overall sloppyness but 
overlaps with a match of a subsequent span and consequently requires advancing 
the subsequent span. Here is an example 

Document: w1 w2 w3 w4 w5
near/0(w1, or(w2, near/0(w2, w3, w4)), or(w5, near/0(w4, w5)))

Add the following code to the end of TestSpanCollection.testNestedNearQuery()

{code}
SpanNearQuery q234 = new SpanNearQuery(new SpanQuery[]{q2, q3, q4}, 0, true);
SpanOrQuery q2234 = new SpanOrQuery(q2, q234);
SpanTermQuery p5 = new SpanTermQuery(new Term(FIELD, "w5"));
SpanNearQuery q45 = new SpanNearQuery(new SpanQuery[]{q4, p5}, 0, true);
SpanOrQuery q455 = new SpanOrQuery(q45, p5);
        
SpanNearQuery q1q2234q445 = new SpanNearQuery(new SpanQuery[]{q1, q2234, q455}, 
0, true);
spans = q1q2234q445.createWeight(searcher, false, 
1f).getSpans(searcher.getIndexReader().leaves().get(0),SpanWeight.Postings.POSITIONS);
assertEquals(0, spans.advance(0));
{code}

I think we can only fix it if we get give up lazy iteration. I don't think this 
is so bad for performance. If we implement a clever caching for positions in 
spans a complete backtracking would only consist of making a few additional 
int-comparisons. The expensive operation is iterating over all span positions 
(IO) and we do this already in advancePosition(Spans, int), aren't we. 

> Nested Span Queries are buggy
> -----------------------------
>
>                 Key: LUCENE-7398
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7398
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: core/search
>    Affects Versions: 5.5, 6.x
>            Reporter: Christoph Goller
>            Assignee: Alan Woodward
>            Priority: Critical
>         Attachments: LUCENE-7398-20160814.patch, LUCENE-7398.patch, 
> LUCENE-7398.patch, TestSpanCollection.java
>
>
> Example for a nested SpanQuery that is not working:
> Document: Human Genome Organization , HUGO , is trying to coordinate gene 
> mapping research worldwide.
> Query: spanNear([body:coordinate, spanOr([spanNear([body:gene, body:mapping], 
> 0, true), body:gene]), body:research], 0, true)
> The query should match "coordinate gene mapping research" as well as 
> "coordinate gene research". It does not match  "coordinate gene mapping 
> research" with Lucene 5.5 or 6.1, it did however match with Lucene 4.10.4. It 
> probably stopped working with the changes on SpanQueries in 5.3. I will 
> attach a unit test that shows the problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to