[ https://issues.apache.org/jira/browse/LUCENE-2287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12839753#action_12839753 ]
Michael Goddard commented on LUCENE-2287: ----------------------------------------- The backward compatibility break was adding public abstract Spans[] getSubSpans(); to the Spans class. I had to do this to enable the recursion on Spans and figured it was the way to go since NearSpansUnordered and NearSpansOrdered had this method. > Unexpected terms are highlighted within nested SpanQuery instances > ------------------------------------------------------------------ > > Key: LUCENE-2287 > URL: https://issues.apache.org/jira/browse/LUCENE-2287 > Project: Lucene - Java > Issue Type: Improvement > Components: contrib/highlighter > Affects Versions: 2.9.1 > Environment: Linux, Solaris, Windows > Reporter: Michael Goddard > Priority: Minor > Attachments: LUCENE-2287.patch, LUCENE-2287.patch, LUCENE-2287.patch, > LUCENE-2287.patch, LUCENE-2287.patch > > Original Estimate: 336h > Remaining Estimate: 336h > > I haven't yet been able to resolve why I'm seeing spurious highlighting in > nested SpanQuery instances. Briefly, the issue is illustrated by the second > instance of "Lucene" being highlighted in the test below, when it doesn't > satisfy the inner span. There's been some discussion about this on the > java-dev list, and I'm opening this issue now because I have made some > initial progress on this. > This new test, added to the HighlighterTest class in lucene_2_9_1, > illustrates this: > /* > * Ref: http://www.lucidimagination.com/blog/2009/07/18/the-spanquery/ > */ > public void testHighlightingNestedSpans2() throws Exception { > String theText = "The Lucene was made by Doug Cutting and Lucene great > Hadoop was"; // Problem > //String theText = "The Lucene was made by Doug Cutting and the great > Hadoop was"; // Works okay > String fieldName = "SOME_FIELD_NAME"; > SpanNearQuery spanNear = new SpanNearQuery(new SpanQuery[] { > new SpanTermQuery(new Term(fieldName, "lucene")), > new SpanTermQuery(new Term(fieldName, "doug")) }, 5, true); > Query query = new SpanNearQuery(new SpanQuery[] { spanNear, > new SpanTermQuery(new Term(fieldName, "hadoop")) }, 4, true); > String expected = "The <B>Lucene</B> was made by <B>Doug</B> Cutting and > Lucene great <B>Hadoop</B> was"; > //String expected = "The <B>Lucene</B> was made by <B>Doug</B> Cutting and > the great <B>Hadoop</B> was"; > String observed = highlightField(query, fieldName, theText); > System.out.println("Expected: \"" + expected + "\n" + "Observed: \"" + > observed); > assertEquals("Why is that second instance of the term \"Lucene\" > highlighted?", expected, observed); > } > Is this an issue that's arisen before? I've been reading through the source > to QueryScorer, WeightedSpanTerm, WeightedSpanTermExtractor, Spans, and > NearSpansOrdered, but haven't found the solution yet. Initially, I thought > that the extractWeightedSpanTerms method in WeightedSpanTermExtractor should > be called on each clause of a SpanNearQuery or SpanOrQuery, but that didn't > get me too far. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org