[ 
https://issues.apache.org/jira/browse/LUCENE-7757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15947451#comment-15947451
 ] 

David Smiley commented on LUCENE-7757:
--------------------------------------

I got to the bottom of this one; it's tricky.  I see two issues:

1. The UH's {{PhraseHelper}} uses {{WeightedSpanTermExtractor}} to convert the 
query to a {{SpanQuery}}.  WSTE has no knowledge of  {{ComplexPhraseQuery}} so 
it has some fallback logic.  {{PhraseHelper}} overrides {{isQueryUnsupported}} 
but it has a lingering TODO with a return true, thus any any query not known in 
advance is not going to be highlighted.  I think this should be modified to 
return false.  I did that locally and I also found it to then be necessary to 
override {{getLeafContext()}} to return a dummy context.  The PH can't produce 
a real leaf context (here) because this is the stage at which it is merely 
analyzing the query, no possible wildcard expansion is done (yet).  The query 
worked in the original Highlighter because there is no split phase.

2. {{ComplexPhraseQueryParser}} produces a special Query subclass 
{{ComplexPhraseQuery}}.  CPQ implements rewrite() that also calls rewrite() on 
the clauses.  It expects a _real_ (not a dummy) leaf context.  So this works 
from a query execution standpoint, but I think it would be more friendly with 
the UH if CPQ didn't cascade the rewrite.  It's not a simple matter of 
commenting out the cascaded rewrite though... I will investigate further when I 
have more time.

> Unified highlighter does not highlight wildcard phrases correctly when 
> ComplexPhraseQueryParser is used
> -------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-7757
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7757
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: modules/highlighter
>    Affects Versions: 6.4
>            Reporter: Bjarke Mortensen
>            Assignee: David Smiley
>
> Given the text:
> "Kontraktsproget vil være dansk og arbejdssproget kan være dansk, svensk, 
> norsk og engelsk"
> and the query:
> \{!complexphrase df=content_da\}("sve* no*")
> the unified highlighter (hl.method=unified) does not return any highlights.
> For reference, the original highlighter returns a snippet with the expected 
> highlights:
> Kontraktsproget vil være dansk og arbejdssproget kan være dansk, 
> <em>svensk</em>, <em>norsk</em> og
> Is this expected behaviour with the unified highlighter?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to