[ 
https://issues.apache.org/jira/browse/LUCENE-1822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi reassigned LUCENE-1822:
--------------------------------------

    Assignee: Koji Sekiguchi
    
> FastVectorHighlighter: SimpleFragListBuilder hard-coded 6 char margin is too 
> naive
> ----------------------------------------------------------------------------------
>
>                 Key: LUCENE-1822
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1822
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/highlighter
>    Affects Versions: 2.9
>         Environment: any
>            Reporter: Alex Vigdor
>            Assignee: Koji Sekiguchi
>            Priority: Minor
>         Attachments: LUCENE-1822.patch
>
>
> The new FastVectorHighlighter performs extremely well, however I've found in 
> testing that the window of text chosen per fragment is often very poor, as it 
> is hard coded in SimpleFragListBuilder to always select starting 6 characters 
> to the left of the first phrase match in a fragment.  When selecting long 
> fragments, this often means that there is barely any context before the 
> highlighted word, and lots after; even worse, when highlighting a phrase at 
> the end of a short text the beginning is cut off, even though the entire 
> phrase would fit in the specified fragCharSize.  For example, highlighting 
> "Punishment" in "Crime and Punishment"  returns "e and <b>Punishment</b>" no 
> matter what fragCharSize is specified.  I am going to attach a patch that 
> improves the text window selection by recalculating the starting margin once 
> all phrases in the fragment have been identified - this way if a single word 
> is matched in a fragment, it will appear in the middle of the highlight, 
> instead of 6 characters from the beginning.  This way one can also guarantee 
> that the entirety of short texts are represented in a fragment by specifying 
> a large enough fragCharSize.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to