[
https://issues.apache.org/jira/browse/LUCENE-8286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16578835#comment-16578835
]
David Smiley commented on LUCENE-8286:
--------------------------------------
Actually before continuing with any of that, I think the PR is *almost* good
enough for this new mode. It's not on by default so you have to opt-in. If
you do opt-in, you get
* Better matching accuracy, particularly with nested conjunction/disjunction
(solving LUCENE-7903).
* Phrase queries will have highlights spanning more naturally instead of
per-term. Cosmetic but nice. SpanQuery nested stuff is as-before in this
regard, though.
* Passage scoring won't be as good due to a constant freq(). Some users won't
care; arguably diversity of terms is more important, particularly in a snippet.
Note that consideration of freq() can be dialed down to nothing by setting the
"k1" BM25 param of DefaultPassageScorer to 0, and this is in fact tested as
having such an effect.
The only thing needed that isn't too disruptive is implementing
OffsetsEnum.getTerm(). The current nocommit of an empty term is bad because
it's also used in DefaultPassageScorer to calculate per-term stats. So this
ought to return the actual term, although it'd be fine if it was actually a
query.toString() in truth. So in the interest of getting an experimental
feature out the door, I think I'll do the latter. Only someone customizing the
scorer or formatter in a way to depend on the nature of the term would be
impacted by that.
> UnifiedHighlighter should support the new Weight.matches API for better match
> accuracy
> --------------------------------------------------------------------------------------
>
> Key: LUCENE-8286
> URL: https://issues.apache.org/jira/browse/LUCENE-8286
> Project: Lucene - Core
> Issue Type: Improvement
> Components: modules/highlighter
> Reporter: David Smiley
> Priority: Major
> Time Spent: 10m
> Remaining Estimate: 0h
>
> The new Weight.matches() API should allow the UnifiedHighlighter to more
> accurately highlight some BooleanQuery patterns correctly -- see LUCENE-7903.
> In addition, this API should make the job of highlighting easier, reducing
> the LOC and related complexities, especially the UH's PhraseHelper. Note:
> reducing/removing PhraseHelper is not a near-term goal since Weight.matches
> is experimental and incomplete, and perhaps we'll discover some gaps in
> flexibility/functionality.
> This issue should introduce a new UnifiedHighlighter.HighlightFlag enum
> option for this method of highlighting. Perhaps call it {{WEIGHT_MATCHES}}?
> Longer term it could go away and it'll be implied if you specify enum values
> for PHRASES & MULTI_TERM_QUERY?
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]