another highlighter
-------------------

                 Key: LUCENE-1522
                 URL: https://issues.apache.org/jira/browse/LUCENE-1522
             Project: Lucene - Java
          Issue Type: Improvement
          Components: contrib/highlighter
            Reporter: Koji Sekiguchi
            Priority: Minor


I've written this highlighter for my project to support bi-gram token stream. 
The idea was inherited from my previous project with my colleague and 
LUCENE-644. This approach needs highlight fields to be 
TermVector.WITH_POSITIONS_OFFSETS, but is fast and can support N-grams. This 
depends on LUCENE-1448 to get refined term offsets.

usage:
{code:java}
Highlighter h = new Highlighter();
FieldQuery fq = h.getFieldQuery( query );
// docId=0, fieldName="content", fragCharSize=100, numFragments=3
String[] fragments = h.getBestFragments( fq, reader, 0, "content", 100, 3 );
{code}

features:
- fast for large docs
- supports "fixed size" N-gram (e.g. (2,2), not (1,3)) (can solve LUCENE-1489)
- supports PhraseQuery, phrase-unit highlighting with slops
{noformat}
q="w1 w2"
<b>w1 w2</b>
---------------
q="w1 w2"~1
<b>w1</b> w3 <b>w2</b> w3 <b>w1 w2</b>
{noformat}
- highlight fields need to be TermVector.WITH_POSITIONS_OFFSETS
- easy to apply patch due to independent package (contrib/highlighter2)
- uses Java 1.5
- looks query boost to score fragments (currently doesn't see idf, but it 
should be possible)
- pluggable FragListBuilder
- pluggable FragmentsBuilder

to do:
- term positions can be unnecessary when phraseHighlight==false
- collects performance numbers


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to