[
https://issues.apache.org/jira/browse/LUCENE-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12744830#action_12744830
]
Alex Vigdor commented on LUCENE-1824:
-------------------------------------
Actually a couple of the existing tests specifically check for the faulty
behavior - the following modification of SimpleFragmentsBuilderTest tests for
the non-truncating behavior implemented in the patch. A couple other tests in
this file fail now (with the strings of "a b b a" etc.), but they don't seem
serious to me (i.e. I would think the tests could be changed to test for the
results they get from the patch).
Index:
contrib/fast-vector-highlighter/src/test/org/apache/lucene/search/vectorhighlight/SimpleFragmentsBuilderTest.java
===================================================================
---
contrib/fast-vector-highlighter/src/test/org/apache/lucene/search/vectorhighlight/SimpleFragmentsBuilderTest.java
(revision 805400)
+++
contrib/fast-vector-highlighter/src/test/org/apache/lucene/search/vectorhighlight/SimpleFragmentsBuilderTest.java
(working copy)
@@ -90,7 +90,7 @@
SimpleFragListBuilder sflb = new SimpleFragListBuilder();
FieldFragList ffl = sflb.createFieldFragList( fpl, 100 );
SimpleFragmentsBuilder sfb = new SimpleFragmentsBuilder();
- assertEquals( " most <b>search engines</b> use only one of these methods.
Even the <b>search engines</b> that says they can use t",
+ assertEquals( " most <b>search engines</b> use only one of these methods.
Even the <b>search engines</b> that says they can use the ",
sfb.createFragment( reader, 0, F, ffl ) );
}
@@ -103,7 +103,7 @@
SimpleFragListBuilder sflb = new SimpleFragListBuilder();
FieldFragList ffl = sflb.createFieldFragList( fpl, 100 );
SimpleFragmentsBuilder sfb = new SimpleFragmentsBuilder();
- assertEquals( "ssing <b>speed</b>, the", sfb.createFragment( reader, 0, F,
ffl ) );
+ assertEquals( " processing <b>speed</b>, the", sfb.createFragment( reader,
0, F, ffl ) );
}
public void testUnstoredField() throws Exception {
> FastVectorHighlighter truncates words at beginning and end of fragments
> -----------------------------------------------------------------------
>
> Key: LUCENE-1824
> URL: https://issues.apache.org/jira/browse/LUCENE-1824
> Project: Lucene - Java
> Issue Type: Improvement
> Components: contrib/*
> Environment: any
> Reporter: Alex Vigdor
> Priority: Minor
> Fix For: 2.9
>
> Attachments: LUCENE-1824.patch
>
>
> FastVectorHighlighter does not take word boundaries into consideration when
> building fragments, so that in most cases the first and last word of a
> fragment are truncated. This makes the highlights less legible than they
> should be. I will attach a patch to BaseFragmentBuilder that resolves this
> by expanding the start and end boundaries of the fragment to the first
> whitespace character on either side of the fragment, or the beginning or end
> of the source text, whichever comes first. This significantly improves
> legibility, at the cost of returning a slightly larger number of characters
> than specified for the fragment size.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]