[jira] [Commented] (LUCENE-7682) UnifiedHighlighter not highlighting all terms relevant in SpanNearQuery

Michael Braun (JIRA) Thu, 09 Feb 2017 11:01:12 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-7682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15860002#comment-15860002
 ]


Michael Braun commented on LUCENE-7682:
---------------------------------------

I think I know why some of this is going on - in NearSpansOrdered 
stretchToOrder handles figuring out the effective position length it needs to 
search over and advances each spans to the relevant distance for a match. The 
second span is advanced just enough so the first instance of 'feed' matches 
(which satisfies the query), and matchEnd is set to that "feed" occurrence's 
end position (and matchWidth updated as well), and it stops after that, so 
NearSpansOrdered effectively does not see that last occurrence of feed when 
twoPhaseCurrentDocMatches() is called (from getTermToSpans in PhraseHelper).  
This first end position of the first "feed" occurrence is what's used instead 
of the last end position within the slop.

> UnifiedHighlighter not highlighting all terms relevant in SpanNearQuery
> -----------------------------------------------------------------------
>
>                 Key: LUCENE-7682
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7682
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: modules/highlighter
>            Reporter: Michael Braun
>
> Original text: "Something for protecting wildlife feed in a feed thing."
> Query is:
>    SpanNearQuery with Slop 9 - in order - 
>       1. SpanTermQuery(wildlife)
>       2. SpanTermQuery(feed)
> This should highlight both instances of "feed" since they are both within 
> slop of 9 of "wildlife". However, only the first instance is highlighted. 
> This occurs with unordered SpanNearQuery as well.  Test below replicates. 
> Affects both the current 6.x line and master.
> Test that fits within TestUnifiedHighlighterMTQ:
> {code}
>   public void testOrderedSpanNearQueryWithDupeTerms() throws Exception {
>     RandomIndexWriter iw = new RandomIndexWriter(random(), dir, 
> indexAnalyzer);
>     Document doc = new Document();
>     doc.add(new Field("body", "Something for protecting wildlife feed in a 
> feed thing.", fieldType));
>     doc.add(newTextField("id", "id", Field.Store.YES));
>     iw.addDocument(doc);
>     IndexReader ir = iw.getReader();
>     iw.close();
>     IndexSearcher searcher = newSearcher(ir);
>     UnifiedHighlighter highlighter = new UnifiedHighlighter(searcher, 
> indexAnalyzer);
>     int docID = searcher.search(new TermQuery(new Term("id", "id")), 
> 1).scoreDocs[0].doc;
>     SpanTermQuery termOne = new SpanTermQuery(new Term("body", "wildlife"));
>     SpanTermQuery termTwo = new SpanTermQuery(new Term("body", "feed"));
>     SpanNearQuery topQuery = new SpanNearQuery.Builder("body", true)
>         .setSlop(9)
>         .addClause(termOne)
>         .addClause(termTwo)
>         .build();
>     int[] docIds = new int[] {docID};
>     String snippets[] = highlighter.highlightFields(new String[] {"body"}, 
> topQuery, docIds, new int[] {2}).get("body");
>     assertEquals(1, snippets.length);
>     assertEquals("Something for protecting <b>wildlife</b> <b>feed</b> in a 
> <b>feed</b> thing.", snippets[0]);
>     ir.close();
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-7682) UnifiedHighlighter not highlighting all terms relevant in SpanNearQuery

Reply via email to