[jira] [Commented] (LUCENE-4376) Add Query subclasses for selecting documents where a field is empty or not

Uwe Schindler (JIRA) Wed, 12 Sep 2012 00:49:12 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-4376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13453811#comment-13453811
 ]


Uwe Schindler commented on LUCENE-4376:
---------------------------------------

The filter is already there, just QueryParser does not support this. To make 
this work for your use case, you can override Lucene's/Solr's QueryParser to 
return ConstantScoreQuery() with the LUCENE-3593 filter as replacement for the 
"field:*" only query. The positive and negative variant works using the boolean 
to the filter.

To conclude: The Query is already there, no need for the 2 new classes. The 
wanted functionality is: 
{code:java}
new ConstantScoreQuery(new FieldValueFilter(String field, boolean negate))
{code}

To find all document with any term in the field use negate=false, otherwise 
negate=true. There is absolutely no need for a Query.

bq. Okay, so would it be straightforward and super-efficient for PrefixQuery to 
do exactly that if the prefix term is zero-length?

Thats super-slow as it will search for all terms in the field. This is what 
e.g. Solr is doing currently for the "field:*" queries. Solr should use the 
filter, too, this would make that much more efficient.
                
> Add Query subclasses for selecting documents where a field is empty or not
> --------------------------------------------------------------------------
>
>                 Key: LUCENE-4376
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4376
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/query/scoring
>            Reporter: Jack Krupansky
>             Fix For: 5.0
>
>
> Users frequently wish to select documents based on whether a specified 
> sparsely-populated field has a value or not. Lucene should provide specific 
> Query subclasses that optimize for these two cases, rather than force users 
> to guess what workaround might be most efficient. It is simplest for users to 
> use a simple pure wildcard term to check for non-empty fields or a negated 
> pure wildcard term to check for empty fields, but it has been suggested that 
> this can be rather inefficient, especially for text fields with many terms.
> 1. Add NonEmptyFieldQuery - selects all documents that have a value for the 
> specified field.
> 2. Add EmptyFieldQuery - selects all documents that do not have a value for 
> the specified field.
> The query parsers could turn a pure wildcard query (asterisk only) into a 
> NonEmptyFieldQuery, and a negated pure wildcard query into an EmptyFieldQuery.
> Alternatively, maybe PrefixQuery could detect pure wildcard and automatically 
> "rewrite" it into NonEmptyFieldQuery.
> My assumption is that if the actual values of the field are not needed, 
> Lucene can much more efficiently simply detect whether values are present, 
> rather than, for example, the user having to create a separate boolean "has 
> value" field that they would query for true or false.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4376) Add Query subclasses for selecting documents where a field is empty or not

Reply via email to