[jira] [Commented] (LUCENE-7398) Nested Span Queries are buggy

Artem Lukanin (JIRA) Mon, 16 Jan 2017 09:06:53 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-7398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15824303#comment-15824303
 ]


Artem Lukanin commented on LUCENE-7398:
---------------------------------------

The patch has a bug. The following sentence is not found, because the 
look-ahead is too greedy: "the system of claim 16 further comprising a user 
location unit adapted to determine user location based on location information 
received from the user's device"

{code:java}
  @Test
  public void testNestedOrQueryLookAhead() throws IOException {
    SpanNearQuery snq = new SpanNearQuery.Builder(FIELD, 
SpanNearQuery.MatchNear.ORDERED_LOOKAHEAD)
        .addClause(new SpanOrQuery(
            new SpanTermQuery(new Term(FIELD, "user")),
            new SpanTermQuery(new Term(FIELD, "ue"))
        ))
        .addClause(new SpanNearQuery.Builder(FIELD, 
SpanNearQuery.MatchNear.ORDERED_LOOKAHEAD)
            .setSlop(3)
            .addClause(new SpanTermQuery(new Term(FIELD, "location")))
            .addClause(new SpanTermQuery(new Term(FIELD, "information")))
            .build()
        )
        .build();

    Spans spans = snq.createWeight(searcher, 
false).getSpans(searcher.getIndexReader().leaves().get(0), 
SpanWeight.Postings.POSITIONS);
    assertEquals(6, spans.advance(0));
    assertEquals(Spans.NO_MORE_DOCS, spans.nextDoc());
  }
{code}

The fix is simple, there should be an additional check inside 
shrinkToDecreaseSlop():
{code:java}
  /** The subSpans are ordered in the same doc and matchSlop is too big.
   * Try and decrease the slop by calling nextStartPosition() on all subSpans 
except the last one in reverse order.
   * Return true iff an ordered match was found with small enough slop.
   */
  private boolean shrinkToDecreaseSlop() throws IOException {
    int lastStart = subSpans[subSpans.length - 1].startPosition();

    for (int i = subSpans.length - 2; i >= 1; i--) { // intermediate spans for 
subSpans.length >= 3
      Spans prevSpans = subSpans[i];
      int prevStart = prevSpans.startPosition();
      int prevEnd = prevSpans.endPosition();
      while (true) { // Advance prevSpans until it is after (lastStart, 
lastEnd) or the slop increases.
        if (prevSpans.nextStartPosition() == NO_MORE_POSITIONS) {
          oneExhaustedInCurrentDoc = true;
          break; // Check remaining subSpans for final match in current doc
        } else {
          int ppEnd = prevSpans.endPosition();
          if (ppEnd > lastStart) { // no more ordered
            break; // Check remaining subSpans.
          } else { // prevSpans still before lastStart
            int ppStart = prevSpans.startPosition();
            int slopIncrease = (prevEnd - prevStart) - (ppEnd - ppStart); // 
span length decrease is slop increase
            if (slopIncrease > 0) {
              break; // Check remaining subSpans.
            } else { // slop did not increase
                prevStart = ppStart;
                prevEnd = ppEnd;
                matchSlop += slopIncrease;
              }
            }
          }
        }
      lastStart = prevStart;
    }

    while (true) { // for subSpans[0] only the end position influences the 
match slop.
      int prevEnd = subSpans[0].endPosition();
      if (subSpans[0].nextStartPosition() == NO_MORE_POSITIONS) {
        oneExhaustedInCurrentDoc = true;
        break;
      }
      int ppEnd = subSpans[0].endPosition();
      if (ppEnd > lastStart) { // no more ordered
        break;
      }
      int slopIncrease = prevEnd - ppEnd;
      if (slopIncrease > 0) {
        break;
      }
      // slop did not increase:
      matchStart = subSpans[0].startPosition();
      matchSlop += slopIncrease;

      // FIX STARTS
      if (matchSlop <= allowedSlop) {
        break;
      }
      // FIX ENDS
    }

    firstSubSpansAfterMatch = true;
    boolean match = matchSlop <= allowedSlop;
    return match; // ordered and allowed slop
  }
{code}

Sorry for not providing a new patch. I'm on a previous version of Lucene.

> Nested Span Queries are buggy
> -----------------------------
>
>                 Key: LUCENE-7398
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7398
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: core/search
>    Affects Versions: 5.5, 6.x
>            Reporter: Christoph Goller
>            Assignee: Alan Woodward
>            Priority: Critical
>         Attachments: LUCENE-7398-20160814.patch, LUCENE-7398-20160924.patch, 
> LUCENE-7398-20160925.patch, LUCENE-7398.patch, LUCENE-7398.patch, 
> LUCENE-7398.patch, TestSpanCollection.java
>
>
> Example for a nested SpanQuery that is not working:
> Document: Human Genome Organization , HUGO , is trying to coordinate gene 
> mapping research worldwide.
> Query: spanNear([body:coordinate, spanOr([spanNear([body:gene, body:mapping], 
> 0, true), body:gene]), body:research], 0, true)
> The query should match "coordinate gene mapping research" as well as 
> "coordinate gene research". It does not match  "coordinate gene mapping 
> research" with Lucene 5.5 or 6.1, it did however match with Lucene 4.10.4. It 
> probably stopped working with the changes on SpanQueries in 5.3. I will 
> attach a unit test that shows the problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-7398) Nested Span Queries are buggy

Reply via email to