[ 
https://issues.apache.org/jira/browse/LUCENE-1822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13472071#comment-13472071
 ] 

Koji Sekiguchi commented on LUCENE-1822:
----------------------------------------

bq. Is this something we should deal with?

I don't think so. The cause is because there are empty values in multi valued 
field in the indexed test data:

{code}
protected static final String[] shortMVValues = {
  "",
  "",
  "a b c",
  "",   // empty data in multi valued field
  "d e"
};
{code}

and these spaces used to be not trimmed before applying the patch. We can open 
another ticket for trimming spaces if needed. Thanks for notifying me it 
anyway, Arcadius.

I'll commit shortly.
                
> FastVectorHighlighter: SimpleFragListBuilder hard-coded 6 char margin is too 
> naive
> ----------------------------------------------------------------------------------
>
>                 Key: LUCENE-1822
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1822
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/highlighter
>    Affects Versions: 2.9
>         Environment: any
>            Reporter: Alex Vigdor
>            Assignee: Koji Sekiguchi
>            Priority: Minor
>         Attachments: LUCENE-1822.patch, LUCENE-1822.patch, LUCENE-1822.patch, 
> LUCENE-1822-tests.patch
>
>
> The new FastVectorHighlighter performs extremely well, however I've found in 
> testing that the window of text chosen per fragment is often very poor, as it 
> is hard coded in SimpleFragListBuilder to always select starting 6 characters 
> to the left of the first phrase match in a fragment.  When selecting long 
> fragments, this often means that there is barely any context before the 
> highlighted word, and lots after; even worse, when highlighting a phrase at 
> the end of a short text the beginning is cut off, even though the entire 
> phrase would fit in the specified fragCharSize.  For example, highlighting 
> "Punishment" in "Crime and Punishment"  returns "e and <b>Punishment</b>" no 
> matter what fragCharSize is specified.  I am going to attach a patch that 
> improves the text window selection by recalculating the starting margin once 
> all phrases in the fragment have been identified - this way if a single word 
> is matched in a fragment, it will appear in the middle of the highlight, 
> instead of 6 characters from the beginning.  This way one can also guarantee 
> that the entirety of short texts are represented in a fragment by specifying 
> a large enough fragCharSize.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to