another highlighter ------------------- Key: LUCENE-1522 URL: https://issues.apache.org/jira/browse/LUCENE-1522 Project: Lucene - Java Issue Type: Improvement Components: contrib/highlighter Reporter: Koji Sekiguchi Priority: Minor
I've written this highlighter for my project to support bi-gram token stream. The idea was inherited from my previous project with my colleague and LUCENE-644. This approach needs highlight fields to be TermVector.WITH_POSITIONS_OFFSETS, but is fast and can support N-grams. This depends on LUCENE-1448 to get refined term offsets. usage: {code:java} Highlighter h = new Highlighter(); FieldQuery fq = h.getFieldQuery( query ); // docId=0, fieldName="content", fragCharSize=100, numFragments=3 String[] fragments = h.getBestFragments( fq, reader, 0, "content", 100, 3 ); {code} features: - fast for large docs - supports "fixed size" N-gram (e.g. (2,2), not (1,3)) (can solve LUCENE-1489) - supports PhraseQuery, phrase-unit highlighting with slops {noformat} q="w1 w2" <b>w1 w2</b> --------------- q="w1 w2"~1 <b>w1</b> w3 <b>w2</b> w3 <b>w1 w2</b> {noformat} - highlight fields need to be TermVector.WITH_POSITIONS_OFFSETS - easy to apply patch due to independent package (contrib/highlighter2) - uses Java 1.5 - looks query boost to score fragments (currently doesn't see idf, but it should be possible) - pluggable FragListBuilder - pluggable FragmentsBuilder to do: - term positions can be unnecessary when phraseHighlight==false - collects performance numbers -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org