[jira] [Commented] (LUCENE-8286) UnifiedHighlighter should support the new Weight.matches API for better match accuracy

David Smiley (JIRA) Sun, 10 Jun 2018 22:20:50 -0700


    [ 
https://issues.apache.org/jira/browse/LUCENE-8286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16507691#comment-16507691
 ]


David Smiley commented on LUCENE-8286:
--------------------------------------

The first patch here is my working WIP.  Everything compiles and the results 
are generally reasonable, notwithstanding some known issues already pointed out 
from my previous comment.  I enabled it by default and then looked to see what 
tests broke and why:

* TestUnifiedHighlighter: all failures are for the testFieldMatcher methods 
since the fieldMatcher mechanism doesn't yet work with this (mentioned in prev 
comment)
* TestUnifiedHighlighterMTQ.testWhichMTQMatched: because MatchesIterator 
doesn't yet expose which term matched.
* TestUnifiedHighlighterRanking: failed because the scoring isn't the same
* TestUnifiedHighlighterTermVec.testFetchTermVecsOncePerDoc: randomly fails 
because sometimes the underlying fields don't have a real index.  The UH 
highlights one field at a time and _that_ field being highlighted will be made 
to appear as indexed if it wasn't already (e.g. re-analysis into MemoryIndex or 
TV LeafReader wrapper) but no other fields will be.  I think once a solution to 
fieldMatcher works, it may solve the situation here.
* TestUnifiedHighlighterStrictPhrases: i haven't reviewed each failure yet but 
it all seems to be due to the distinction between highlighting words in phrases 
by themselves or highlighting the phrase span.  All the assertions assume words 
by themselves.

What's cool is that this wasn't a big change, and it can be intermixed with 
SpanQueries.  I need to look at the scoring options more -- loss of freq() is a 
shame.

> UnifiedHighlighter should support the new Weight.matches API for better match 
> accuracy
> --------------------------------------------------------------------------------------
>
>                 Key: LUCENE-8286
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8286
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/highlighter
>            Reporter: David Smiley
>            Priority: Major
>         Attachments: LUCENE-8286.patch
>
>
> The new Weight.matches() API should allow the UnifiedHighlighter to more 
> accurately highlight some BooleanQuery patterns correctly -- see LUCENE-7903.
> In addition, this API should make the job of highlighting easier, reducing 
> the LOC and related complexities, especially the UH's PhraseHelper.  Note: 
> reducing/removing PhraseHelper is not a near-term goal since Weight.matches 
> is experimental and incomplete, and perhaps we'll discover some gaps in 
> flexibility/functionality.
> This issue should introduce a new UnifiedHighlighter.HighlightFlag enum 
> option for this method of highlighting.   Perhaps call it {{WEIGHT_MATCHES}}? 
>  Longer term it could go away and it'll be implied if you specify enum values 
> for PHRASES & MULTI_TERM_QUERY?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-8286) UnifiedHighlighter should support the new Weight.matches API for better match accuracy

Reply via email to