[ https://issues.apache.org/jira/browse/LUCENE-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12745048#action_12745048 ]
Alex Vigdor commented on LUCENE-1824: ------------------------------------- The failing test was due to an extra whitespace character at the beginning of the output, which I think is insignificant. However, I appreciate that the whitespace approach will not work for CJK, so I have moved my modifications to a new WhitespaceFragmentBuilder class and associated test class. The updated patch now contains just these two new classes and no modifications to other code. I don't want to hold up the release of 2.9, but anyone attempting to use the SimpleFragmentsBuilder with latin languages, or others that use whitespace to delimit words, will be dismayed by the rampant truncation! > FastVectorHighlighter truncates words at beginning and end of fragments > ----------------------------------------------------------------------- > > Key: LUCENE-1824 > URL: https://issues.apache.org/jira/browse/LUCENE-1824 > Project: Lucene - Java > Issue Type: Improvement > Components: contrib/* > Environment: any > Reporter: Alex Vigdor > Priority: Minor > Fix For: 3.1 > > Attachments: LUCENE-1824.patch > > > FastVectorHighlighter does not take word boundaries into consideration when > building fragments, so that in most cases the first and last word of a > fragment are truncated. This makes the highlights less legible than they > should be. I will attach a patch to BaseFragmentBuilder that resolves this > by expanding the start and end boundaries of the fragment to the first > whitespace character on either side of the fragment, or the beginning or end > of the source text, whichever comes first. This significantly improves > legibility, at the cost of returning a slightly larger number of characters > than specified for the fragment size. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org