Timothy M. Rodriguez created LUCENE-7438:
--------------------------------------------

             Summary: UnifiedHighlighter
                 Key: LUCENE-7438
                 URL: https://issues.apache.org/jira/browse/LUCENE-7438
             Project: Lucene - Core
          Issue Type: Improvement
          Components: modules/highlighter
    Affects Versions: 6.2
            Reporter: Timothy M. Rodriguez


The UnifiedHighlighter is an evolution of the PostingsHighlighter that is able 
to highlight using offsets in either postings, term vectors, or from analysis 
(a TokenStream). Lucene’s existing highlighters are mostly demarcated along 
offset source lines, whereas here it is unified -- hence this proposed name. In 
this highlighter, the offset source strategy is separated from the core 
highlighting functionalty. The UnifiedHighlighter further improves on the 
PostingsHighlighter’s design by supporting accurate phrase highlighting using 
an approach similar to the standard highlighter’s WeightedSpanTermExtractor. 
The next major improvement is a hybrid offset source strategythat utilizes 
postings and “light” term vectors (i.e. just the terms) for highlighting 
multi-term queries (wildcards) without resorting to analysis. Phrase 
highlighting and wildcard highlighting can both be disabled if you’d rather 
highlight a little faster albeit not as accurately reflecting the query.
We’ve benchmarked an earlier version of this highlighter comparing it to the 
other highlighters and the results were exciting! It’s tempting to share those 
results but it’s definitely due for another benchmark, so we’ll work on that. 
Performance was the main motivator for creating the UnifiedHighlighter, as the 
standard Highlighter (the only one meeting Bloomberg Law’s accuracy 
requirements) wasn’t fast enough, even with term vectors along with several 
improvements we contributed back, and even after we forked it to highlight in 
multiple threads.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to