David Smiley created LUCENE-8446:
------------------------------------

             Summary: UnifiedHighlighter DefaultPassageFormatter should merge 
overlapping offsets
                 Key: LUCENE-8446
                 URL: https://issues.apache.org/jira/browse/LUCENE-8446
             Project: Lucene - Core
          Issue Type: Improvement
          Components: modules/highlighter
            Reporter: David Smiley
            Assignee: David Smiley


The UnifiedHighlighter's DefaultPassageFormatter (mostly unchanged from the old 
PostingsHighlighter) will format overlapping matches by closing a tag and 
immediately opening a tag.  I think this is a bit ugly structurally and it 
ought to continue the tag is if the matches were merged.  This is extremely 
rare in practice today since a match is always a word, and thus we'd only see 
this behavior if multiple words at the same position of different offsets are 
highlighted.  The advent of matches representing phrases will increase the 
probability of this, and indeed was discovered while working on LUCENE-8286.  
Additionally, and related, OffsetsEnums should internally be ordered by the end 
offset if the start offset is the same.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to