Jack Krupansky created LUCENE-4386:
--------------------------------------
Summary: Query parser should generate FieldValueFilter for pure
wildcard terms to boost query performance
Key: LUCENE-4386
URL: https://issues.apache.org/jira/browse/LUCENE-4386
Project: Lucene - Core
Issue Type: Improvement
Components: core/queryparser
Affects Versions: 4.0-BETA
Reporter: Jack Krupansky
Fix For: 4.0
In theory, a simple pure wildcard query (a single asterisk) is an inefficient
way to select all documents that have any value in a field. Rather than users
having to work around this issue by adding a separate boolean "has" field, it
would be better to have the query parser directly generate the most efficient
Lucene query for detecting all documents that have any value for a specified
field. According to the discussion over on LUCENE-4376, the FieldValueFilter is
the proper solution.
Proposed solution:
QueryParserBase.getPrefixQuery could detect when the query is a pure wildcard
(a single asterisk) and then generate a FieldValueFilter instead of a
PrefixQuery. My understanding from LUCENE-4376 is that the following would work:
{code}
new ConstantScoreQuery(new FieldValueFilter(fieldname, false))
{code}
Oh, and the check for whether "leading wildcard" is enabled would need to be
bypassed for this case.
I still think it would be better to have PrefixQuery perform this optimization
internally so that all apps would benefit, but this should be sufficient to
address the main concern.
This improvement would improve the classic Lucene query parser and other query
parsers based on it, including edismax. There might be other query parsers
which won't see the impact of this change, but they can be updated separately.
How much performance benefit? Unknown, but supposedly significant. The goal is
simply to have a simple pure wildcard be the obvious tool to select fields that
have a value in a field.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]