[ https://issues.apache.org/jira/browse/LUCENE-1522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Koji Sekiguchi updated LUCENE-1522: ----------------------------------- Attachment: LUCENE-1522.patch to apply this patch, LUCENE-1448 also need to be applied. {code} $ svn co -r713975 http://svn.apache.org/repos/asf/lucene/java/trunk $ cd trunk $ patch -p0 < LUCENE-1448.patch $ patch -p0 < LUCENE-1522.patch {code} > another highlighter > ------------------- > > Key: LUCENE-1522 > URL: https://issues.apache.org/jira/browse/LUCENE-1522 > Project: Lucene - Java > Issue Type: Improvement > Components: contrib/highlighter > Reporter: Koji Sekiguchi > Priority: Minor > Attachments: LUCENE-1522.patch > > > I've written this highlighter for my project to support bi-gram token stream. > The idea was inherited from my previous project with my colleague and > LUCENE-644. This approach needs highlight fields to be > TermVector.WITH_POSITIONS_OFFSETS, but is fast and can support N-grams. This > depends on LUCENE-1448 to get refined term offsets. > usage: > {code:java} > Highlighter h = new Highlighter(); > FieldQuery fq = h.getFieldQuery( query ); > // docId=0, fieldName="content", fragCharSize=100, numFragments=3 > String[] fragments = h.getBestFragments( fq, reader, 0, "content", 100, 3 ); > {code} > features: > - fast for large docs > - supports "fixed size" N-gram (e.g. (2,2), not (1,3)) (can solve LUCENE-1489) > - supports PhraseQuery, phrase-unit highlighting with slops > {noformat} > q="w1 w2" > <b>w1 w2</b> > --------------- > q="w1 w2"~1 > <b>w1</b> w3 <b>w2</b> w3 <b>w1 w2</b> > {noformat} > - highlight fields need to be TermVector.WITH_POSITIONS_OFFSETS > - easy to apply patch due to independent package (contrib/highlighter2) > - uses Java 1.5 > - looks query boost to score fragments (currently doesn't see idf, but it > should be possible) > - pluggable FragListBuilder > - pluggable FragmentsBuilder > to do: > - term positions can be unnecessary when phraseHighlight==false > - collects performance numbers -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org