[ 
https://issues.apache.org/jira/browse/LUCENE-8286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16593692#comment-16593692
 ] 

David Smiley commented on LUCENE-8286:
--------------------------------------

_The PR is ready to go I think._  I'll commit in a couple days.

OE.getTerm is implemented consistent with how the others work.

I also tracked down a curious observation in one of the tests ( *not* for 
MatchesIterator) that revealed that sloppy phrase queries sometimes won't 
highlight faithfully to the original because WeightedSpanTermExtractor's 
conversion of a PhraseQuery to a SpanQuery will set inOrder=false when there is 
slop.  This just goes to show that MatchesIterator based highlighting is more 
accurate in multiple ways.

Suggested CHANGES.txt:
The UnifiedHighlighter now has a new experimental HighlightFlag.WEIGHT_MATCHES 
flag that causes it to use Lucene's new Weight.getMatches API.  This will more 
accurately and strictly highlight, solving issues like LUCENE-7903.  Phrases 
will be formatted with a single span per occurrence instead of its words 
separately.  Passage relevancy might be degraded, however, since "freq" isn't 
calculated.  The flag is disabled by default.  There were some API changes that 
are public but internal to the UH, including a new UHComponents class.

> UnifiedHighlighter should support the new Weight.matches API for better match 
> accuracy
> --------------------------------------------------------------------------------------
>
>                 Key: LUCENE-8286
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8286
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/highlighter
>            Reporter: David Smiley
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> The new Weight.matches() API should allow the UnifiedHighlighter to more 
> accurately highlight some BooleanQuery patterns correctly -- see LUCENE-7903.
> In addition, this API should make the job of highlighting easier, reducing 
> the LOC and related complexities, especially the UH's PhraseHelper.  Note: 
> reducing/removing PhraseHelper is not a near-term goal since Weight.matches 
> is experimental and incomplete, and perhaps we'll discover some gaps in 
> flexibility/functionality.
> This issue should introduce a new UnifiedHighlighter.HighlightFlag enum 
> option for this method of highlighting.   Perhaps call it {{WEIGHT_MATCHES}}? 
>  Longer term it could go away and it'll be implied if you specify enum values 
> for PHRASES & MULTI_TERM_QUERY?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to