[ 
https://issues.apache.org/jira/browse/LUCENE-8848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16860110#comment-16860110
 ] 

David Smiley commented on LUCENE-8848:
--------------------------------------

This patch will resolve linked issues relating to both the Surround query 
parser and ComplexPhrase query parser.

> UnifiedHighlighter should highlight all Query types that implement 
> Weight.matches
> ---------------------------------------------------------------------------------
>
>                 Key: LUCENE-8848
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8848
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/highlighter
>            Reporter: David Smiley
>            Priority: Major
>         Attachments: LUCENE-8848.patch
>
>
> The UnifiedHighlighter internally extracts terms and automata from the query. 
>  Usually this works perfectly but it's possible a Query might be of a type it 
> doesn't know -- a leaf query that is perhaps in effect similar to a 
> MultiTermQuery yet it might not even be a subclass of this or it does but the 
> UH doesn't know how to extract an automata from it.  The UH is oblivious to 
> this and probably won't highlight this query.  If re-analysis of the text is 
> necessary, the UH will pre-filter all terms to only those it _thinks_ are 
> pertinent.  Or if offsets are in the postings then the UH could perform very 
> poorly by unleashing this query on the index for each highlighted document 
> without recognizing re-analysis is a more appropriate path.
> I think to solve this, the UnifiedHighlighter.getFieldHighlighter needs to 
> inspect the query (using a QueryVisitor) to see if it can find a leaf query 
> that is not one it knows how to pull automata from, and is otherwise not in a 
> special list (like MatchAllDocsQuery).  If we find one, we avoid choosing 
> OffsetSource.POSTINGS or OffsetSource.NONE_NEEDED since we might in effect 
> have an MTQ like query.  If a MemoryIndex is needed then we don't pre-filter 
> the terms since we can't assume we know precisely which terms are pertinent.
> We needn't bother extracting terms & automata in this case either; it's 
> wasted effort which can involve building a CharacterRunAutomaton (see 
> MultiTermHighlighting.binaryToCharRunAutomaton).  Speaking of which, it'd be 
> nice to avoid that in other cases as well, like for WEIGHT_MATCHES when we 
> aren't using MemoryIndex (thus no term pre-filtering).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to