[ 
https://issues.apache.org/jira/browse/LUCENE-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12744830#action_12744830
 ] 

Alex Vigdor commented on LUCENE-1824:
-------------------------------------

Actually a couple of the existing tests specifically check for the faulty 
behavior - the following modification of SimpleFragmentsBuilderTest tests for 
the non-truncating behavior implemented in the patch.  A couple other tests in 
this file fail now (with the strings of "a b b a" etc.), but they don't seem 
serious to me (i.e. I would think the tests could be changed to test for the 
results they get from the patch).

Index: 
contrib/fast-vector-highlighter/src/test/org/apache/lucene/search/vectorhighlight/SimpleFragmentsBuilderTest.java
===================================================================
--- 
contrib/fast-vector-highlighter/src/test/org/apache/lucene/search/vectorhighlight/SimpleFragmentsBuilderTest.java
   (revision 805400)
+++ 
contrib/fast-vector-highlighter/src/test/org/apache/lucene/search/vectorhighlight/SimpleFragmentsBuilderTest.java
   (working copy)
@@ -90,7 +90,7 @@
     SimpleFragListBuilder sflb = new SimpleFragListBuilder();
     FieldFragList ffl = sflb.createFieldFragList( fpl, 100 );
     SimpleFragmentsBuilder sfb = new SimpleFragmentsBuilder();
-    assertEquals( " most <b>search engines</b> use only one of these methods. 
Even the <b>search engines</b> that says they can use t",
+    assertEquals( " most <b>search engines</b> use only one of these methods. 
Even the <b>search engines</b> that says they can use the ",
         sfb.createFragment( reader, 0, F, ffl ) );
   }
 
@@ -103,7 +103,7 @@
     SimpleFragListBuilder sflb = new SimpleFragListBuilder();
     FieldFragList ffl = sflb.createFieldFragList( fpl, 100 );
     SimpleFragmentsBuilder sfb = new SimpleFragmentsBuilder();
-    assertEquals( "ssing <b>speed</b>, the", sfb.createFragment( reader, 0, F, 
ffl ) );
+    assertEquals( " processing <b>speed</b>, the", sfb.createFragment( reader, 
0, F, ffl ) );
   }
   
   public void testUnstoredField() throws Exception {


> FastVectorHighlighter truncates words at beginning and end of fragments
> -----------------------------------------------------------------------
>
>                 Key: LUCENE-1824
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1824
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: contrib/*
>         Environment: any
>            Reporter: Alex Vigdor
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1824.patch
>
>
> FastVectorHighlighter does not take word boundaries into consideration when 
> building fragments, so that in most cases the first and last word of a 
> fragment are truncated.  This makes the highlights less legible than they 
> should be.  I will attach a patch to BaseFragmentBuilder that resolves this 
> by expanding the start and end boundaries of the fragment to the first 
> whitespace character on either side of the fragment, or the beginning or end 
> of the source text, whichever comes first.  This significantly improves 
> legibility, at the cost of returning a slightly larger number of characters 
> than specified for the fragment size.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to