Chris Earle created LUCENE-6334:
-----------------------------------
Summary: Term Vector Highlighter does not properly span
neighboring term offsets
Key: LUCENE-6334
URL: https://issues.apache.org/jira/browse/LUCENE-6334
Project: Lucene - Core
Issue Type: Bug
Components: core/termvectors, modules/highlighter
Reporter: Chris Earle
If you are using term vectors for fast vector highlighting along with a
multivalue field while matching a phrase that crosses two elements, then it
will not properly highlight even though it _properly_ finds the correct values
to highlight.
A good example of this is when matching source code, where you might have lines
like:
{code}
one two three five
two three four
five six five
six seven eight nine eight nine eight nine eight nine eight nine eight nine
eight nine
ten eleven
twelve thirteen
{code}
Matching the phrase "four five" will return
{code}
two three four
five six five
six seven eight nine eight nine eight nine eight nine eight
eight nine
ten eleven
{code}
However, it does not properly highlight "four" (on the first line) and "five"
(on the second line).
The problem lies in the [BaseFragmentsBuilder at line 269|
https://github.com/apache/lucene-solr/blob/trunk/lucene/highlighter/src/java/org/apache/lucene/search/vectorhighlight/BaseFragmentsBuilder.java#L269]
because it is not checking for cross-coverage:
{code}
boolean started = toffs.getStartOffset() >= fieldStart;
boolean ended = toffs.getEndOffset() <= fieldEnd;
// existing behavior:
if (started && ended) {
toffsList.add(toffs);
toffsIterator.remove();
}
else if (started) {
toffsList.add(new Toffs(toffs.getStartOffset(), field.end));
// toffsIterator.remove(); // is this necessary?
}
else if (ended) {
toffsList.add(new Toffs(fieldStart, toff.getEndOffset()));
// toffsIterator.remove(); // is this necessary?
}
else if (toffs.getEndOffset() > fieldEnd) {
// ie the toff spans whole field
toffsList.add(new Toffs(fieldStart, fieldEnd));
// toffsIterator.remove(); // is this necessary?
}
{code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]