David Smiley created LUCENE-8184:

             Summary: Enable flexible Query.rewrite
                 Key: LUCENE-8184
                 URL: https://issues.apache.org/jira/browse/LUCENE-8184
             Project: Lucene - Core
          Issue Type: Improvement
            Reporter: David Smiley

I think {{Query.rewrite(IndexReader)}} should be generalized a bit to enable 
users to customize the rewrite process outside of the Query classes (i.e. 
without having to create a custom Query just to implement rewrite).  This could 
be as simple as having rewrite accept a QueryRewriter parameter that has a 
method that accepts a Query to be rewritten.  Only this method would call 
rewrite on any given Query.  And given very few spots actually use the 
IndexReader arg, we could even remove that as a parameter and add a getter to 
QueryRewriter (which is allowed to return null).  Or create a subclass e.g. 
QueryRewriterWithIndexReader if some prefer casting; debatable.
Today, users have to hard-code Lucene class names with related logic for each 
one.  This is obviously annoying/tedious, and brittle as Lucene adds to 
queries, and tends to be duplicative.  Examples of why an app might want to 
rewrite a query:
 * to replace position-sensitive queries that are not already SpanQuery's with 
their SpanQuery equivalent.  This is useful in highlighting -- Luwak's 
SpanRewriter does this.
 * to simplify BooleanQuery's to a canonical form, or other canonicalization 
such as BoostQuery boost of 1.  The point is to simplify or strengthen the 
accuracy of query examination logic for whatever further purpose (e.g. routing 
a query for an optimization).  
 * to replace one field for another
 * to "fix" pure negative queries so that they work (by adding a MatchAllDocs 
query).  I'm surprised we still live with this.
 * to relax a query that doesn't match to a looser one that does (e.g. 
manipulate minimumNumberShouldMatch) without re-parsing the query.  Granted 
re-parsing affords using different analysis or other strategies.
 * to make it easier to use a Lucene Query class as a base class during query 
parsing/building. You could rewrite to strip out/replace only the AST nodes and 
leave the real Lucene Queries as-is.

Finally until LUCENE-3041 is addressed (generic Query visitor) a customizable 
rewrite would allow a generic query visitor using a QueryRewriter that doesn't 
actually rewrite anything.  It's a little abusive as it's doing wasted work and 
no rewrite is actually occurring, but I think the overhead needn't be that much 
and such a use-case might even special-case BooleanQuery in particular to lower 
the overhead further.  Basically for known many-child aggregator Query classes, 
customize to simply delegate.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to