Hi David, Thanks for your quick help! We ended up making our custom query extend Query and wrap a MultiPhraseQuery instance instead of extending MPQ. This way, the right branch is taken in WeightedSpanTermExtractor.
Just wanted to bring this to a conclusion and make the solution searchable for others. Thanks, Thomas From: David Smiley [mailto:[email protected]] Sent: Wednesday, December 21, 2016 10:40 AM To: [email protected] Cc: Nate Ko <[email protected]> Subject: Re: [5.5.x] Highlighter and rewriting MultiPhraseQuery Hi Thomas, This is a constant source of maintenance in Lucene -- updating all of our highlighters to be aware of new queries. Some of them require more maintenance than others; by far WSTE is the hotspot. WSTE avoids calling query.rewrite(IndexReader) because for some queries it can be quite expensive. It doesn't really know for sure in the face of custom queries, if this is okay or not. I recommend extending WSTE and add some instanceof checks for your custom queries so that you do whatever the right things is for your query. If you use the UnifiedHighlighter, new in Lucene as of 6.3, there are several callback hooks provided for this sort of thing without exposing the guts of it, which uses WSTE. BTW it's obvious to me that Query needs some sort of visitor API. It's very much related to maintaining the highlighters with respect to new/different queries. https://issues.apache.org/jira/browse/LUCENE-3041<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FLUCENE-3041&data=02%7C01%7CThomas.Kappler%40microsoft.com%7Cf1edd013179a482985a308d429d0bffe%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636179423923480566&sdata=IpJB038353uetEdydMOnPWDZRwXVZCWry7hSBVBBRPU%3D&reserved=0> Good luck, ~ David On Dec 21, 2016, at 1:27 PM, Thomas Kappler <[email protected]<mailto:[email protected]>> wrote: Hi, We have implemented a custom query that extends MultiPhraseQuery (MPQ) because it uses MPQ’s getTermArrays() and getPositions(). We’d like to use this query for highlighting, but we’re facing the following issue. In highlighter/WeightedSpanTermExtractor, the extract() method does a series of instanceof checks. There is a special case for MPQ. This branch does not call rewrite(IndexReader) on the query, but our custom query needs rewriting to work properly. As a test, I commented the MPQ branch in WeightedSpanTermExtractor so the code takes the last else branch, where rewrite(IndexReader) is called, and our tests pass. My questions are * Are we doing it wrong when our query *needs* rewriting? Our query logic needs the IndexReader that’s passed in in rewrite(IndexReader). Where else would we put such code? * If we aren’t doing it wrong, how can we use the highlighter? Extend Query instead of MPQ and copy the tracking of term arrays and positions from MPQ? Thanks, Thomas
