FastVectorHighlighter: SimpleFragListBuilder hard-coded 6 char margin is too 
naive
----------------------------------------------------------------------------------

                 Key: LUCENE-1822
                 URL: https://issues.apache.org/jira/browse/LUCENE-1822
             Project: Lucene - Java
          Issue Type: Improvement
          Components: contrib/*
    Affects Versions: 2.9
         Environment: any
            Reporter: Alex Vigdor
            Priority: Minor
         Attachments: LUCENE-1822.patch

The new FastVectorHighlighter performs extremely well, however I've found in 
testing that the window of text chosen per fragment is often very poor, as it 
is hard coded in SimpleFragListBuilder to always select starting 6 characters 
to the left of the first phrase match in a fragment.  When selecting long 
fragments, this often means that there is barely any context before the 
highlighted word, and lots after; even worse, when highlighting a phrase at the 
end of a short text the beginning is cut off, even though the entire phrase 
would fit in the specified fragCharSize.  For example, highlighting 
"Punishment" in "Crime and Punishment"  returns "e and <b>Punishment</b>" no 
matter what fragCharSize is specified.  I am going to attach a patch that 
improves the text window selection by recalculating the starting margin once 
all phrases in the fragment have been identified - this way if a single word is 
matched in a fragment, it will appear in the middle of the highlight, instead 
of 6 characters from the beginning.  This way one can also guarantee that the 
entirety of short texts are represented in a fragment by specifying a large 
enough fragCharSize.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to