highlighter problem with n-gram tokens
--------------------------------------
Key: LUCENE-1489
URL: https://issues.apache.org/jira/browse/LUCENE-1489
Project: Lucene - Java
Issue Type: Bug
Components: contrib/highlighter
Reporter: Koji Sekiguchi
Priority: Minor
I have a problem when using n-gram and highlighter. I thought it had been
solved in LUCENE-627...
Actually, I found this problem when I was using CJKTokenizer on Solr, though,
here is lucene program to reproduce it using NGramTokenizer(min=2,max=2)
instead of CJKTokenizer:
{code:java}
public class TestNGramHighlighter {
public static void main(String[] args) throws Exception {
Analyzer analyzer = new NGramAnalyzer();
final String TEXT = "Lucene can make index. Then Lucene can search.";
final String QUERY = "can";
QueryParser parser = new QueryParser("f",analyzer);
Query query = parser.parse(QUERY);
QueryScorer scorer = new QueryScorer(query,"f");
Highlighter h = new Highlighter( scorer );
System.out.println( h.getBestFragment(analyzer, "f", TEXT) );
}
static class NGramAnalyzer extends Analyzer {
public TokenStream tokenStream(String field, Reader input) {
return new NGramTokenizer(input,2,2);
}
}
}
{code}
expected output is:
Lucene <B>can</B> make index. Then Lucene <B>can</B> search.
but the actual output is:
Lucene <B>can make index. Then Lucene can</B> search.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]