[
https://issues.apache.org/jira/browse/LUCENE-7682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15884727#comment-15884727
]
Paul Elschot commented on LUCENE-7682:
--------------------------------------
For queries requiring t1 near t2 with enough slop, t1 t1 t2 matches twice, but
t1 t2 t2 matches only once. This behaviour was introduced with the lazy
iteration, see:
https://issues.apache.org/jira/browse/LUCENE-6537?focusedCommentId=14579537&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14579537
This is also a problem for LUCENE-7580 where matching term occurrences are
scored: there the second occurrence of t2 will not influence the score because
it is never reported as a match.
LUCENE-7398 is probably also of interest here.
To improve highlighting and scoring, we will probably have to rethink how
matches of span queries are reported.
One way could be to report all occurrences in the matching window, and forward
all the sub-spans to after the matching window.
Would that be feasible?
> UnifiedHighlighter not highlighting all terms relevant in SpanNearQuery
> -----------------------------------------------------------------------
>
> Key: LUCENE-7682
> URL: https://issues.apache.org/jira/browse/LUCENE-7682
> Project: Lucene - Core
> Issue Type: Bug
> Components: modules/highlighter
> Reporter: Michael Braun
>
> Original text: "Something for protecting wildlife feed in a feed thing."
> Query is:
> SpanNearQuery with Slop 9 - in order -
> 1. SpanTermQuery(wildlife)
> 2. SpanTermQuery(feed)
> This should highlight both instances of "feed" since they are both within
> slop of 9 of "wildlife". However, only the first instance is highlighted.
> This occurs with unordered SpanNearQuery as well. Test below replicates.
> Affects both the current 6.x line and master.
> Test that fits within TestUnifiedHighlighterMTQ:
> {code}
> public void testOrderedSpanNearQueryWithDupeTerms() throws Exception {
> RandomIndexWriter iw = new RandomIndexWriter(random(), dir,
> indexAnalyzer);
> Document doc = new Document();
> doc.add(new Field("body", "Something for protecting wildlife feed in a
> feed thing.", fieldType));
> doc.add(newTextField("id", "id", Field.Store.YES));
> iw.addDocument(doc);
> IndexReader ir = iw.getReader();
> iw.close();
> IndexSearcher searcher = newSearcher(ir);
> UnifiedHighlighter highlighter = new UnifiedHighlighter(searcher,
> indexAnalyzer);
> int docID = searcher.search(new TermQuery(new Term("id", "id")),
> 1).scoreDocs[0].doc;
> SpanTermQuery termOne = new SpanTermQuery(new Term("body", "wildlife"));
> SpanTermQuery termTwo = new SpanTermQuery(new Term("body", "feed"));
> SpanNearQuery topQuery = new SpanNearQuery.Builder("body", true)
> .setSlop(9)
> .addClause(termOne)
> .addClause(termTwo)
> .build();
> int[] docIds = new int[] {docID};
> String snippets[] = highlighter.highlightFields(new String[] {"body"},
> topQuery, docIds, new int[] {2}).get("body");
> assertEquals(1, snippets.length);
> assertEquals("Something for protecting <b>wildlife</b> <b>feed</b> in a
> <b>feed</b> thing.", snippets[0]);
> ir.close();
> }
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]