FastVectorHighlighter: SimpleFragListBuilder hard-coded 6 char margin is too
naive
----------------------------------------------------------------------------------
Key: LUCENE-1822
URL: https://issues.apache.org/jira/browse/LUCENE-1822
Project: Lucene - Java
Issue Type: Improvement
Components: contrib/*
Affects Versions: 2.9
Environment: any
Reporter: Alex Vigdor
Priority: Minor
Attachments: LUCENE-1822.patch
The new FastVectorHighlighter performs extremely well, however I've found in
testing that the window of text chosen per fragment is often very poor, as it
is hard coded in SimpleFragListBuilder to always select starting 6 characters
to the left of the first phrase match in a fragment. When selecting long
fragments, this often means that there is barely any context before the
highlighted word, and lots after; even worse, when highlighting a phrase at the
end of a short text the beginning is cut off, even though the entire phrase
would fit in the specified fragCharSize. For example, highlighting
"Punishment" in "Crime and Punishment" returns "e and <b>Punishment</b>" no
matter what fragCharSize is specified. I am going to attach a patch that
improves the text window selection by recalculating the starting margin once
all phrases in the fragment have been identified - this way if a single word is
matched in a fragment, it will appear in the middle of the highlight, instead
of 6 characters from the beginning. This way one can also guarantee that the
entirety of short texts are represented in a fragment by specifying a large
enough fragCharSize.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]