[
https://issues.apache.org/jira/browse/LUCENE-7682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15860002#comment-15860002
]
Michael Braun commented on LUCENE-7682:
---------------------------------------
I think I know why some of this is going on - in NearSpansOrdered
stretchToOrder handles figuring out the effective position length it needs to
search over and advances each spans to the relevant distance for a match. The
second span is advanced just enough so the first instance of 'feed' matches
(which satisfies the query), and matchEnd is set to that "feed" occurrence's
end position (and matchWidth updated as well), and it stops after that, so
NearSpansOrdered effectively does not see that last occurrence of feed when
twoPhaseCurrentDocMatches() is called (from getTermToSpans in PhraseHelper).
This first end position of the first "feed" occurrence is what's used instead
of the last end position within the slop.
> UnifiedHighlighter not highlighting all terms relevant in SpanNearQuery
> -----------------------------------------------------------------------
>
> Key: LUCENE-7682
> URL: https://issues.apache.org/jira/browse/LUCENE-7682
> Project: Lucene - Core
> Issue Type: Bug
> Components: modules/highlighter
> Reporter: Michael Braun
>
> Original text: "Something for protecting wildlife feed in a feed thing."
> Query is:
> SpanNearQuery with Slop 9 - in order -
> 1. SpanTermQuery(wildlife)
> 2. SpanTermQuery(feed)
> This should highlight both instances of "feed" since they are both within
> slop of 9 of "wildlife". However, only the first instance is highlighted.
> This occurs with unordered SpanNearQuery as well. Test below replicates.
> Affects both the current 6.x line and master.
> Test that fits within TestUnifiedHighlighterMTQ:
> {code}
> public void testOrderedSpanNearQueryWithDupeTerms() throws Exception {
> RandomIndexWriter iw = new RandomIndexWriter(random(), dir,
> indexAnalyzer);
> Document doc = new Document();
> doc.add(new Field("body", "Something for protecting wildlife feed in a
> feed thing.", fieldType));
> doc.add(newTextField("id", "id", Field.Store.YES));
> iw.addDocument(doc);
> IndexReader ir = iw.getReader();
> iw.close();
> IndexSearcher searcher = newSearcher(ir);
> UnifiedHighlighter highlighter = new UnifiedHighlighter(searcher,
> indexAnalyzer);
> int docID = searcher.search(new TermQuery(new Term("id", "id")),
> 1).scoreDocs[0].doc;
> SpanTermQuery termOne = new SpanTermQuery(new Term("body", "wildlife"));
> SpanTermQuery termTwo = new SpanTermQuery(new Term("body", "feed"));
> SpanNearQuery topQuery = new SpanNearQuery.Builder("body", true)
> .setSlop(9)
> .addClause(termOne)
> .addClause(termTwo)
> .build();
> int[] docIds = new int[] {docID};
> String snippets[] = highlighter.highlightFields(new String[] {"body"},
> topQuery, docIds, new int[] {2}).get("body");
> assertEquals(1, snippets.length);
> assertEquals("Something for protecting <b>wildlife</b> <b>feed</b> in a
> <b>feed</b> thing.", snippets[0]);
> ir.close();
> }
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]