[
https://issues.apache.org/jira/browse/LUCENE-8286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16507691#comment-16507691
]
David Smiley commented on LUCENE-8286:
--------------------------------------
The first patch here is my working WIP. Everything compiles and the results
are generally reasonable, notwithstanding some known issues already pointed out
from my previous comment. I enabled it by default and then looked to see what
tests broke and why:
* TestUnifiedHighlighter: all failures are for the testFieldMatcher methods
since the fieldMatcher mechanism doesn't yet work with this (mentioned in prev
comment)
* TestUnifiedHighlighterMTQ.testWhichMTQMatched: because MatchesIterator
doesn't yet expose which term matched.
* TestUnifiedHighlighterRanking: failed because the scoring isn't the same
* TestUnifiedHighlighterTermVec.testFetchTermVecsOncePerDoc: randomly fails
because sometimes the underlying fields don't have a real index. The UH
highlights one field at a time and _that_ field being highlighted will be made
to appear as indexed if it wasn't already (e.g. re-analysis into MemoryIndex or
TV LeafReader wrapper) but no other fields will be. I think once a solution to
fieldMatcher works, it may solve the situation here.
* TestUnifiedHighlighterStrictPhrases: i haven't reviewed each failure yet but
it all seems to be due to the distinction between highlighting words in phrases
by themselves or highlighting the phrase span. All the assertions assume words
by themselves.
What's cool is that this wasn't a big change, and it can be intermixed with
SpanQueries. I need to look at the scoring options more -- loss of freq() is a
shame.
> UnifiedHighlighter should support the new Weight.matches API for better match
> accuracy
> --------------------------------------------------------------------------------------
>
> Key: LUCENE-8286
> URL: https://issues.apache.org/jira/browse/LUCENE-8286
> Project: Lucene - Core
> Issue Type: Improvement
> Components: modules/highlighter
> Reporter: David Smiley
> Priority: Major
> Attachments: LUCENE-8286.patch
>
>
> The new Weight.matches() API should allow the UnifiedHighlighter to more
> accurately highlight some BooleanQuery patterns correctly -- see LUCENE-7903.
> In addition, this API should make the job of highlighting easier, reducing
> the LOC and related complexities, especially the UH's PhraseHelper. Note:
> reducing/removing PhraseHelper is not a near-term goal since Weight.matches
> is experimental and incomplete, and perhaps we'll discover some gaps in
> flexibility/functionality.
> This issue should introduce a new UnifiedHighlighter.HighlightFlag enum
> option for this method of highlighting. Perhaps call it {{WEIGHT_MATCHES}}?
> Longer term it could go away and it'll be implied if you specify enum values
> for PHRASES & MULTI_TERM_QUERY?
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]