[
https://issues.apache.org/jira/browse/LUCENE-7398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15824303#comment-15824303
]
Artem Lukanin commented on LUCENE-7398:
---------------------------------------
The patch has a bug. The following sentence is not found, because the
look-ahead is too greedy: "the system of claim 16 further comprising a user
location unit adapted to determine user location based on location information
received from the user's device"
{code:java}
@Test
public void testNestedOrQueryLookAhead() throws IOException {
SpanNearQuery snq = new SpanNearQuery.Builder(FIELD,
SpanNearQuery.MatchNear.ORDERED_LOOKAHEAD)
.addClause(new SpanOrQuery(
new SpanTermQuery(new Term(FIELD, "user")),
new SpanTermQuery(new Term(FIELD, "ue"))
))
.addClause(new SpanNearQuery.Builder(FIELD,
SpanNearQuery.MatchNear.ORDERED_LOOKAHEAD)
.setSlop(3)
.addClause(new SpanTermQuery(new Term(FIELD, "location")))
.addClause(new SpanTermQuery(new Term(FIELD, "information")))
.build()
)
.build();
Spans spans = snq.createWeight(searcher,
false).getSpans(searcher.getIndexReader().leaves().get(0),
SpanWeight.Postings.POSITIONS);
assertEquals(6, spans.advance(0));
assertEquals(Spans.NO_MORE_DOCS, spans.nextDoc());
}
{code}
The fix is simple, there should be an additional check inside
shrinkToDecreaseSlop():
{code:java}
/** The subSpans are ordered in the same doc and matchSlop is too big.
* Try and decrease the slop by calling nextStartPosition() on all subSpans
except the last one in reverse order.
* Return true iff an ordered match was found with small enough slop.
*/
private boolean shrinkToDecreaseSlop() throws IOException {
int lastStart = subSpans[subSpans.length - 1].startPosition();
for (int i = subSpans.length - 2; i >= 1; i--) { // intermediate spans for
subSpans.length >= 3
Spans prevSpans = subSpans[i];
int prevStart = prevSpans.startPosition();
int prevEnd = prevSpans.endPosition();
while (true) { // Advance prevSpans until it is after (lastStart,
lastEnd) or the slop increases.
if (prevSpans.nextStartPosition() == NO_MORE_POSITIONS) {
oneExhaustedInCurrentDoc = true;
break; // Check remaining subSpans for final match in current doc
} else {
int ppEnd = prevSpans.endPosition();
if (ppEnd > lastStart) { // no more ordered
break; // Check remaining subSpans.
} else { // prevSpans still before lastStart
int ppStart = prevSpans.startPosition();
int slopIncrease = (prevEnd - prevStart) - (ppEnd - ppStart); //
span length decrease is slop increase
if (slopIncrease > 0) {
break; // Check remaining subSpans.
} else { // slop did not increase
prevStart = ppStart;
prevEnd = ppEnd;
matchSlop += slopIncrease;
}
}
}
}
lastStart = prevStart;
}
while (true) { // for subSpans[0] only the end position influences the
match slop.
int prevEnd = subSpans[0].endPosition();
if (subSpans[0].nextStartPosition() == NO_MORE_POSITIONS) {
oneExhaustedInCurrentDoc = true;
break;
}
int ppEnd = subSpans[0].endPosition();
if (ppEnd > lastStart) { // no more ordered
break;
}
int slopIncrease = prevEnd - ppEnd;
if (slopIncrease > 0) {
break;
}
// slop did not increase:
matchStart = subSpans[0].startPosition();
matchSlop += slopIncrease;
// FIX STARTS
if (matchSlop <= allowedSlop) {
break;
}
// FIX ENDS
}
firstSubSpansAfterMatch = true;
boolean match = matchSlop <= allowedSlop;
return match; // ordered and allowed slop
}
{code}
Sorry for not providing a new patch. I'm on a previous version of Lucene.
> Nested Span Queries are buggy
> -----------------------------
>
> Key: LUCENE-7398
> URL: https://issues.apache.org/jira/browse/LUCENE-7398
> Project: Lucene - Core
> Issue Type: Bug
> Components: core/search
> Affects Versions: 5.5, 6.x
> Reporter: Christoph Goller
> Assignee: Alan Woodward
> Priority: Critical
> Attachments: LUCENE-7398-20160814.patch, LUCENE-7398-20160924.patch,
> LUCENE-7398-20160925.patch, LUCENE-7398.patch, LUCENE-7398.patch,
> LUCENE-7398.patch, TestSpanCollection.java
>
>
> Example for a nested SpanQuery that is not working:
> Document: Human Genome Organization , HUGO , is trying to coordinate gene
> mapping research worldwide.
> Query: spanNear([body:coordinate, spanOr([spanNear([body:gene, body:mapping],
> 0, true), body:gene]), body:research], 0, true)
> The query should match "coordinate gene mapping research" as well as
> "coordinate gene research". It does not match "coordinate gene mapping
> research" with Lucene 5.5 or 6.1, it did however match with Lucene 4.10.4. It
> probably stopped working with the changes on SpanQueries in 5.3. I will
> attach a unit test that shows the problem.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]