[jira] [Commented] (LUCENE-8286) UnifiedHighlighter should support the new Weight.matches API for better match accuracy

Jim Ferenczi (JIRA) Wed, 02 May 2018 07:37:31 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-8286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16461105#comment-16461105
 ]


Jim Ferenczi commented on LUCENE-8286:
--------------------------------------

I also think that it would greatly simplify the code (especially PhraseHelper 
;) ) but matches require some changes to allow this replacement. First of all 
there is no way to retrieve the term/query in the matches iterator so it's not 
possible to count the number of occurrences of a specific query or the total 
frequency in the document. These informations are needed to compute the score 
of a passage so we need to add something in matches.
The matches iterator can return duplicates (if the same term is present in 
multiple clauses) and will soon be able to return matches from phrases (rather 
than individual terms), this means that we'll need to detect overlapping 
intervals when the passages are built. I see this as an improvement since it 
would allow to highlight entire phrases but for spans we'll need an option to 
split matches interval since a span near (or any other span query) can have big 
gaps so it would not make sense to highlight the entire match in a single 
highlight.
One thing we could do to simplify the transition is to remove OffsetsEnum 
entirely and replace it with the MatchesIterator, appart from the missing bits 
I described above this should be easy to do.


> UnifiedHighlighter should support the new Weight.matches API for better match 
> accuracy
> --------------------------------------------------------------------------------------
>
>                 Key: LUCENE-8286
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8286
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/highlighter
>            Reporter: David Smiley
>            Priority: Major
>
> The new Weight.matches() API should allow the UnifiedHighlighter to more 
> accurately highlight some BooleanQuery patterns correctly -- see LUCENE-7903.
> In addition, this API should make the job of highlighting easier, reducing 
> the LOC and related complexities, especially the UH's PhraseHelper.  Note: 
> reducing/removing PhraseHelper is not a near-term goal since Weight.matches 
> is experimental and incomplete, and perhaps we'll discover some gaps in 
> flexibility/functionality.
> This issue should introduce a new UnifiedHighlighter.HighlightFlag enum 
> option for this method of highlighting.   Perhaps call it {{WEIGHT_MATCHES}}? 
>  Longer term it could go away and it'll be implied if you specify enum values 
> for PHRASES & MULTI_TERM_QUERY?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-8286) UnifiedHighlighter should support the new Weight.matches API for better match accuracy

Reply via email to