[
https://issues.apache.org/jira/browse/LUCENE-4386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13455418#comment-13455418
]
Uwe Schindler commented on LUCENE-4386:
---------------------------------------
The reason for my commit is not "unsafe" or whatever. It is just, that this
filter needs FieldCache and that is a large performance impact on the first
call when automatically build from QueryParser.
I am strongly against adding this to Lucene's QueryParser by default. Solr
already has support for *:* and similar, so it could use this filter in its own
QueryParser impl (as replacement for the current ConstantScore RangeQuery,
which is slow.
> Query parser should generate FieldValueFilter for pure wildcard terms to
> boost query performance
> ------------------------------------------------------------------------------------------------
>
> Key: LUCENE-4386
> URL: https://issues.apache.org/jira/browse/LUCENE-4386
> Project: Lucene - Core
> Issue Type: Improvement
> Components: core/queryparser
> Affects Versions: 4.0-BETA
> Reporter: Jack Krupansky
> Fix For: 4.0
>
>
> In theory, a simple pure wildcard query (a single asterisk) is an inefficient
> way to select all documents that have any value in a field. Rather than users
> having to work around this issue by adding a separate boolean "has" field, it
> would be better to have the query parser directly generate the most efficient
> Lucene query for detecting all documents that have any value for a specified
> field. According to the discussion over on LUCENE-4376, the FieldValueFilter
> is the proper solution.
> Proposed solution:
> QueryParserBase.getPrefixQuery could detect when the query is a pure wildcard
> (a single asterisk) and then generate a FieldValueFilter instead of a
> PrefixQuery. My understanding from LUCENE-4376 is that the following would
> work:
> {code}
> new ConstantScoreQuery(new FieldValueFilter(fieldname, false))
> {code}
> Oh, and the check for whether "leading wildcard" is enabled would need to be
> bypassed for this case.
> I still think it would be better to have PrefixQuery perform this
> optimization internally so that all apps would benefit, but this should be
> sufficient to address the main concern.
> This improvement would improve the classic Lucene query parser and other
> query parsers based on it, including edismax. There might be other query
> parsers which won't see the impact of this change, but they can be updated
> separately.
> How much performance benefit? Unknown, but supposedly significant. The goal
> is simply to have a simple pure wildcard be the obvious tool to select fields
> that have a value in a field.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]