Chris Earle created LUCENE-6334:
-----------------------------------

             Summary: Term Vector Highlighter does not properly span 
neighboring term offsets
                 Key: LUCENE-6334
                 URL: https://issues.apache.org/jira/browse/LUCENE-6334
             Project: Lucene - Core
          Issue Type: Bug
          Components: core/termvectors, modules/highlighter
            Reporter: Chris Earle


If you are using term vectors for fast vector highlighting along with a 
multivalue field while matching a phrase that crosses two elements, then it 
will not properly highlight even though it _properly_ finds the correct values 
to highlight.

A good example of this is when matching source code, where you might have lines 
like:

{code}
one two three five
two three four
five six five
six seven eight nine eight nine eight nine eight nine eight nine eight nine
eight nine
ten eleven
twelve thirteen
{code}

Matching the phrase "four five" will return

{code}
two three four
five six five
six seven eight nine eight nine eight nine eight nine eight
eight nine
ten eleven
{code}

However, it does not properly highlight "four" (on the first line) and "five" 
(on the second line).

The problem lies in the [BaseFragmentsBuilder at line 269| 
https://github.com/apache/lucene-solr/blob/trunk/lucene/highlighter/src/java/org/apache/lucene/search/vectorhighlight/BaseFragmentsBuilder.java#L269]
 because it is not checking for cross-coverage:

{code}
boolean started = toffs.getStartOffset() >= fieldStart;
boolean ended = toffs.getEndOffset() <= fieldEnd;

// existing behavior:
if (started && ended) {
    toffsList.add(toffs);
    toffsIterator.remove();
}
else if (started) {
    toffsList.add(new Toffs(toffs.getStartOffset(), field.end));
    // toffsIterator.remove(); // is this necessary?
}
else if (ended) {
    toffsList.add(new Toffs(fieldStart, toff.getEndOffset()));
    // toffsIterator.remove(); // is this necessary?
}
else if (toffs.getEndOffset() > fieldEnd) {
    // ie the toff spans whole field
    toffsList.add(new Toffs(fieldStart, fieldEnd));
    // toffsIterator.remove(); // is this necessary?
}
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to