Jack Krupansky created LUCENE-4386:
--------------------------------------

             Summary: Query parser should generate FieldValueFilter for pure 
wildcard terms to boost query performance
                 Key: LUCENE-4386
                 URL: https://issues.apache.org/jira/browse/LUCENE-4386
             Project: Lucene - Core
          Issue Type: Improvement
          Components: core/queryparser
    Affects Versions: 4.0-BETA
            Reporter: Jack Krupansky
             Fix For: 4.0


In theory, a simple pure wildcard query (a single asterisk) is an inefficient 
way to select all documents that have any value in a field. Rather than users 
having to work around this issue by adding a separate boolean "has" field, it 
would be better to have the query parser directly generate the most efficient 
Lucene query for detecting all documents that have any value for a specified 
field. According to the discussion over on LUCENE-4376, the FieldValueFilter is 
the proper solution.

Proposed solution:

QueryParserBase.getPrefixQuery could detect when the query is a pure wildcard 
(a single asterisk) and then generate a FieldValueFilter instead of a 
PrefixQuery. My understanding from LUCENE-4376 is that the following would work:

{code}
new ConstantScoreQuery(new FieldValueFilter(fieldname, false))
{code}

Oh, and the check for whether "leading wildcard" is enabled would need to be 
bypassed for this case.

I still think it would be better to have PrefixQuery perform this optimization 
internally so that all apps would benefit, but this should be sufficient to 
address the main concern.

This improvement would improve the classic Lucene query parser and other query 
parsers based on it, including edismax. There might be other query parsers 
which won't see the impact of this change, but they can be updated separately.

How much performance benefit? Unknown, but supposedly significant. The goal is 
simply to have a simple pure wildcard be the obvious tool to select fields that 
have a value in a field.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to