[
https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12527134
]
Andy Liu commented on LUCENE-794:
---------------------------------
Ah, I wasn't crazy. I had the test data wrong. Here's the code I'm using to
produce the failing result:
String text = "y z x y z a b";
Analyzer analyzer = new StandardAnalyzer();
QueryParser parser = new QueryParser("body", analyzer);
Query query = parser.parse("\"x y z\"");
CachingTokenFilter tokenStream = new
CachingTokenFilter(analyzer.tokenStream("body", new StringReader(text)));
Highlighter highlighter = new Highlighter(new SpanScorer(query, "body",
tokenStream));
highlighter.setTextFragmenter(new NullFragmenter());
tokenStream.reset();
String result = highlighter.getBestFragments(tokenStream, text, 1,
"...");
System.out.println(result);
This produces:
<B>y</B> <B>z</B> <B>x</B> <B>y</B> <B>z</B> a b
The beginning y and z shouldn't be highlighted.
If I change the the beginning y and z to x and y, I get the correct result:
"x y x y z a b" => x y <B>x</B> <B>y</B> <B>z</B> a b
Here's a couple other failing results:
"z x y z a b" => <B>z</B> <B>x</B> <B>y</B> <B>z</B> a b
"z a x y z a b" => <B>z</B> a <B>x</B> <B>y</B> <B>z</B> a b
FYI, I'm using the latest version of Lucene.
> Extend contrib Highlighter to properly support phrase queries and span queries
> ------------------------------------------------------------------------------
>
> Key: LUCENE-794
> URL: https://issues.apache.org/jira/browse/LUCENE-794
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Other
> Reporter: Mark Miller
> Priority: Minor
> Attachments: CachedTokenStream.java, CachedTokenStream.java,
> CachedTokenStream.java, DefaultEncoder.java, Encoder.java, Formatter.java,
> Highlighter.java, Highlighter.java, Highlighter.java, Highlighter.java,
> Highlighter.java, HighlighterTest.java, HighlighterTest.java,
> HighlighterTest.java, HighlighterTest.java, MemoryIndex.java,
> QuerySpansExtractor.java, QuerySpansExtractor.java, QuerySpansExtractor.java,
> QuerySpansExtractor.java, SimpleFormatter.java, spanhighlighter.patch,
> spanhighlighter10.patch, spanhighlighter2.patch, spanhighlighter3.patch,
> spanhighlighter5.patch, spanhighlighter6.patch, spanhighlighter7.patch,
> spanhighlighter8.patch, spanhighlighter9.patch, spanhighlighter_patch_4.zip,
> SpanHighlighterTest.java, SpanHighlighterTest.java, SpanScorer.java,
> SpanScorer.java, WeightedSpanTerm.java
>
>
> This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter
> package that scores just like QueryScorer, but scores a 0 for Terms that did
> not cause the Query hit. This gives 'actual' hit highlighting for the range
> of SpanQuerys and PhraseQuery. There is also a new Fragmenter that attempts
> to fragment without breaking up Spans.
> See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
> There is a dependency on MemoryIndex.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]