[jira] [Commented] (LUCENE-8286) UnifiedHighlighter should support the new Weight.matches API for better match accuracy

David Smiley (JIRA) Mon, 13 Aug 2018 12:51:17 -0700


    [ 
https://issues.apache.org/jira/browse/LUCENE-8286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16578835#comment-16578835
 ]


David Smiley commented on LUCENE-8286:
--------------------------------------

Actually before continuing with any of that, I think the PR is *almost* good 
enough for this new mode.  It's not on by default so you have to opt-in.  If 
you do opt-in, you get
* Better matching accuracy, particularly with nested conjunction/disjunction 
(solving LUCENE-7903).
* Phrase queries will have highlights spanning more naturally instead of 
per-term.  Cosmetic but nice.  SpanQuery nested stuff is as-before in this 
regard, though.
* Passage scoring won't be as good due to a constant freq().  Some users won't 
care; arguably diversity of terms is more important, particularly in a snippet. 
 Note that consideration of freq() can be dialed down to nothing by setting the 
"k1" BM25 param of DefaultPassageScorer to 0, and this is in fact tested as 
having such an effect.

The only thing needed that isn't too disruptive is implementing 
OffsetsEnum.getTerm().  The current nocommit of an empty term is bad because 
it's also used in DefaultPassageScorer to calculate per-term stats.  So this 
ought to return the actual term, although it'd be fine if it was actually a 
query.toString() in truth.  So in the interest of getting an experimental 
feature out the door, I think I'll do the latter.  Only someone customizing the 
scorer or formatter in a way to depend on the nature of the term would be 
impacted by that.

> UnifiedHighlighter should support the new Weight.matches API for better match 
> accuracy
> --------------------------------------------------------------------------------------
>
>                 Key: LUCENE-8286
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8286
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/highlighter
>            Reporter: David Smiley
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> The new Weight.matches() API should allow the UnifiedHighlighter to more 
> accurately highlight some BooleanQuery patterns correctly -- see LUCENE-7903.
> In addition, this API should make the job of highlighting easier, reducing 
> the LOC and related complexities, especially the UH's PhraseHelper.  Note: 
> reducing/removing PhraseHelper is not a near-term goal since Weight.matches 
> is experimental and incomplete, and perhaps we'll discover some gaps in 
> flexibility/functionality.
> This issue should introduce a new UnifiedHighlighter.HighlightFlag enum 
> option for this method of highlighting.   Perhaps call it {{WEIGHT_MATCHES}}? 
>  Longer term it could go away and it'll be implied if you specify enum values 
> for PHRASES & MULTI_TERM_QUERY?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-8286) UnifiedHighlighter should support the new Weight.matches API for better match accuracy

Reply via email to