Hi David,

Thanks for your quick help! We ended up making our custom query extend Query 
and wrap a MultiPhraseQuery instance instead of extending MPQ. This way, the 
right branch is taken in WeightedSpanTermExtractor.

Just wanted to bring this to a conclusion and make the solution searchable for 
others.

Thanks,
Thomas


From: David Smiley [mailto:[email protected]]
Sent: Wednesday, December 21, 2016 10:40 AM
To: [email protected]
Cc: Nate Ko <[email protected]>
Subject: Re: [5.5.x] Highlighter and rewriting MultiPhraseQuery

Hi Thomas,

This is a constant source of maintenance in Lucene -- updating all of our 
highlighters to be aware of new queries.  Some of them require more maintenance 
than others; by far WSTE is the hotspot.  WSTE avoids calling 
query.rewrite(IndexReader) because for some queries it can be quite expensive.  
It doesn't really know for sure in the face of custom queries, if this is okay 
or not.  I recommend extending WSTE and add some instanceof checks for your 
custom queries so that you do whatever the right things is for your query.  If 
you use the UnifiedHighlighter, new in Lucene as of 6.3, there are several 
callback hooks provided for this sort of thing without exposing the guts of it, 
which uses WSTE.

BTW it's obvious to me that Query needs some sort of visitor API.  It's very 
much related to maintaining the highlighters with respect to new/different 
queries.  
https://issues.apache.org/jira/browse/LUCENE-3041<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FLUCENE-3041&data=02%7C01%7CThomas.Kappler%40microsoft.com%7Cf1edd013179a482985a308d429d0bffe%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636179423923480566&sdata=IpJB038353uetEdydMOnPWDZRwXVZCWry7hSBVBBRPU%3D&reserved=0>

Good luck,

~ David

On Dec 21, 2016, at 1:27 PM, Thomas Kappler 
<[email protected]<mailto:[email protected]>> wrote:

Hi,

We have implemented a custom query that extends MultiPhraseQuery (MPQ) because 
it uses MPQ’s getTermArrays() and getPositions(). We’d like to use this query 
for highlighting, but we’re facing the following issue.

In highlighter/WeightedSpanTermExtractor, the extract() method does a series of 
instanceof checks. There is a special case for MPQ. This branch does not call 
rewrite(IndexReader) on the query, but our custom query needs rewriting to work 
properly.

As a test, I commented the MPQ branch in WeightedSpanTermExtractor so the code 
takes the last else branch, where rewrite(IndexReader) is called, and our tests 
pass.

My questions are

  *   Are we doing it wrong when our query *needs* rewriting? Our query logic 
needs the IndexReader that’s passed in in rewrite(IndexReader). Where else 
would we put such code?
  *   If we aren’t doing it wrong, how can we use the highlighter? Extend Query 
instead of MPQ and copy the tracking of term arrays and positions from MPQ?

Thanks,
Thomas

Reply via email to