David Smiley created LUCENE-8184:
------------------------------------
Summary: Enable flexible Query.rewrite
Key: LUCENE-8184
URL: https://issues.apache.org/jira/browse/LUCENE-8184
Project: Lucene - Core
Issue Type: Improvement
Reporter: David Smiley
I think {{Query.rewrite(IndexReader)}} should be generalized a bit to enable
users to customize the rewrite process outside of the Query classes (i.e.
without having to create a custom Query just to implement rewrite). This could
be as simple as having rewrite accept a QueryRewriter parameter that has a
method that accepts a Query to be rewritten. Only this method would call
rewrite on any given Query. And given very few spots actually use the
IndexReader arg, we could even remove that as a parameter and add a getter to
QueryRewriter (which is allowed to return null). Or create a subclass e.g.
QueryRewriterWithIndexReader if some prefer casting; debatable.
Today, users have to hard-code Lucene class names with related logic for each
one. This is obviously annoying/tedious, and brittle as Lucene adds to
queries, and tends to be duplicative. Examples of why an app might want to
rewrite a query:
* to replace position-sensitive queries that are not already SpanQuery's with
their SpanQuery equivalent. This is useful in highlighting -- Luwak's
SpanRewriter does this.
* to simplify BooleanQuery's to a canonical form, or other canonicalization
such as BoostQuery boost of 1. The point is to simplify or strengthen the
accuracy of query examination logic for whatever further purpose (e.g. routing
a query for an optimization).
* to replace one field for another
* to "fix" pure negative queries so that they work (by adding a MatchAllDocs
query). I'm surprised we still live with this.
* to relax a query that doesn't match to a looser one that does (e.g.
manipulate minimumNumberShouldMatch) without re-parsing the query. Granted
re-parsing affords using different analysis or other strategies.
* to make it easier to use a Lucene Query class as a base class during query
parsing/building. You could rewrite to strip out/replace only the AST nodes and
leave the real Lucene Queries as-is.
Finally until LUCENE-3041 is addressed (generic Query visitor) a customizable
rewrite would allow a generic query visitor using a QueryRewriter that doesn't
actually rewrite anything. It's a little abusive as it's doing wasted work and
no rewrite is actually occurring, but I think the overhead needn't be that much
and such a use-case might even special-case BooleanQuery in particular to lower
the overhead further. Basically for known many-child aggregator Query classes,
customize to simply delegate.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]