[ 
https://issues.apache.org/jira/browse/LUCENE-1522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi updated LUCENE-1522:
-----------------------------------

    Attachment: LUCENE-1522.patch

to apply this patch, LUCENE-1448 also need to be applied.
{code}
$ svn co -r713975 http://svn.apache.org/repos/asf/lucene/java/trunk
$ cd trunk
$ patch -p0 < LUCENE-1448.patch
$ patch -p0 < LUCENE-1522.patch
{code}


> another highlighter
> -------------------
>
>                 Key: LUCENE-1522
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1522
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: contrib/highlighter
>            Reporter: Koji Sekiguchi
>            Priority: Minor
>         Attachments: LUCENE-1522.patch
>
>
> I've written this highlighter for my project to support bi-gram token stream. 
> The idea was inherited from my previous project with my colleague and 
> LUCENE-644. This approach needs highlight fields to be 
> TermVector.WITH_POSITIONS_OFFSETS, but is fast and can support N-grams. This 
> depends on LUCENE-1448 to get refined term offsets.
> usage:
> {code:java}
> Highlighter h = new Highlighter();
> FieldQuery fq = h.getFieldQuery( query );
> // docId=0, fieldName="content", fragCharSize=100, numFragments=3
> String[] fragments = h.getBestFragments( fq, reader, 0, "content", 100, 3 );
> {code}
> features:
> - fast for large docs
> - supports "fixed size" N-gram (e.g. (2,2), not (1,3)) (can solve LUCENE-1489)
> - supports PhraseQuery, phrase-unit highlighting with slops
> {noformat}
> q="w1 w2"
> <b>w1 w2</b>
> ---------------
> q="w1 w2"~1
> <b>w1</b> w3 <b>w2</b> w3 <b>w1 w2</b>
> {noformat}
> - highlight fields need to be TermVector.WITH_POSITIONS_OFFSETS
> - easy to apply patch due to independent package (contrib/highlighter2)
> - uses Java 1.5
> - looks query boost to score fragments (currently doesn't see idf, but it 
> should be possible)
> - pluggable FragListBuilder
> - pluggable FragmentsBuilder
> to do:
> - term positions can be unnecessary when phraseHighlight==false
> - collects performance numbers

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to