[ https://issues.apache.org/jira/browse/LUCENE-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12744830#action_12744830 ]
Alex Vigdor commented on LUCENE-1824: ------------------------------------- Actually a couple of the existing tests specifically check for the faulty behavior - the following modification of SimpleFragmentsBuilderTest tests for the non-truncating behavior implemented in the patch. A couple other tests in this file fail now (with the strings of "a b b a" etc.), but they don't seem serious to me (i.e. I would think the tests could be changed to test for the results they get from the patch). Index: contrib/fast-vector-highlighter/src/test/org/apache/lucene/search/vectorhighlight/SimpleFragmentsBuilderTest.java =================================================================== --- contrib/fast-vector-highlighter/src/test/org/apache/lucene/search/vectorhighlight/SimpleFragmentsBuilderTest.java (revision 805400) +++ contrib/fast-vector-highlighter/src/test/org/apache/lucene/search/vectorhighlight/SimpleFragmentsBuilderTest.java (working copy) @@ -90,7 +90,7 @@ SimpleFragListBuilder sflb = new SimpleFragListBuilder(); FieldFragList ffl = sflb.createFieldFragList( fpl, 100 ); SimpleFragmentsBuilder sfb = new SimpleFragmentsBuilder(); - assertEquals( " most <b>search engines</b> use only one of these methods. Even the <b>search engines</b> that says they can use t", + assertEquals( " most <b>search engines</b> use only one of these methods. Even the <b>search engines</b> that says they can use the ", sfb.createFragment( reader, 0, F, ffl ) ); } @@ -103,7 +103,7 @@ SimpleFragListBuilder sflb = new SimpleFragListBuilder(); FieldFragList ffl = sflb.createFieldFragList( fpl, 100 ); SimpleFragmentsBuilder sfb = new SimpleFragmentsBuilder(); - assertEquals( "ssing <b>speed</b>, the", sfb.createFragment( reader, 0, F, ffl ) ); + assertEquals( " processing <b>speed</b>, the", sfb.createFragment( reader, 0, F, ffl ) ); } public void testUnstoredField() throws Exception { > FastVectorHighlighter truncates words at beginning and end of fragments > ----------------------------------------------------------------------- > > Key: LUCENE-1824 > URL: https://issues.apache.org/jira/browse/LUCENE-1824 > Project: Lucene - Java > Issue Type: Improvement > Components: contrib/* > Environment: any > Reporter: Alex Vigdor > Priority: Minor > Fix For: 2.9 > > Attachments: LUCENE-1824.patch > > > FastVectorHighlighter does not take word boundaries into consideration when > building fragments, so that in most cases the first and last word of a > fragment are truncated. This makes the highlights less legible than they > should be. I will attach a patch to BaseFragmentBuilder that resolves this > by expanding the start and end boundaries of the fragment to the first > whitespace character on either side of the fragment, or the beginning or end > of the source text, whichever comes first. This significantly improves > legibility, at the cost of returning a slightly larger number of characters > than specified for the fragment size. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org