highlighter problem with n-gram tokens
--------------------------------------

                 Key: LUCENE-1489
                 URL: https://issues.apache.org/jira/browse/LUCENE-1489
             Project: Lucene - Java
          Issue Type: Bug
          Components: contrib/highlighter
            Reporter: Koji Sekiguchi
            Priority: Minor


I have a problem when using n-gram and highlighter. I thought it had been 
solved in LUCENE-627...

Actually, I found this problem when I was using CJKTokenizer on Solr, though, 
here is lucene program to reproduce it using NGramTokenizer(min=2,max=2) 
instead of CJKTokenizer:

{code:java}
public class TestNGramHighlighter {

  public static void main(String[] args) throws Exception {
    Analyzer analyzer = new NGramAnalyzer();
    final String TEXT = "Lucene can make index. Then Lucene can search.";
    final String QUERY = "can";
    QueryParser parser = new QueryParser("f",analyzer);
    Query query = parser.parse(QUERY);
    QueryScorer scorer = new QueryScorer(query,"f");
    Highlighter h = new Highlighter( scorer );
    System.out.println( h.getBestFragment(analyzer, "f", TEXT) );
  }

  static class NGramAnalyzer extends Analyzer {
    public TokenStream tokenStream(String field, Reader input) {
      return new NGramTokenizer(input,2,2);
    }
  }
}
{code}

expected output is:
Lucene <B>can</B> make index. Then Lucene <B>can</B> search.

but the actual output is:
Lucene <B>can make index. Then Lucene can</B> search.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to