highlighter problem with n-gram tokens --------------------------------------
Key: LUCENE-1489 URL: https://issues.apache.org/jira/browse/LUCENE-1489 Project: Lucene - Java Issue Type: Bug Components: contrib/highlighter Reporter: Koji Sekiguchi Priority: Minor I have a problem when using n-gram and highlighter. I thought it had been solved in LUCENE-627... Actually, I found this problem when I was using CJKTokenizer on Solr, though, here is lucene program to reproduce it using NGramTokenizer(min=2,max=2) instead of CJKTokenizer: {code:java} public class TestNGramHighlighter { public static void main(String[] args) throws Exception { Analyzer analyzer = new NGramAnalyzer(); final String TEXT = "Lucene can make index. Then Lucene can search."; final String QUERY = "can"; QueryParser parser = new QueryParser("f",analyzer); Query query = parser.parse(QUERY); QueryScorer scorer = new QueryScorer(query,"f"); Highlighter h = new Highlighter( scorer ); System.out.println( h.getBestFragment(analyzer, "f", TEXT) ); } static class NGramAnalyzer extends Analyzer { public TokenStream tokenStream(String field, Reader input) { return new NGramTokenizer(input,2,2); } } } {code} expected output is: Lucene <B>can</B> make index. Then Lucene <B>can</B> search. but the actual output is: Lucene <B>can make index. Then Lucene can</B> search. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org