[jira] [Updated] (SOLR-6318) QParser for TermsFilter

David Smiley (JIRA) Tue, 05 Aug 2014 20:46:23 -0700

     [ 
https://issues.apache.org/jira/browse/SOLR-6318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


David Smiley updated SOLR-6318:
-------------------------------

    Attachment: SOLR-6318__terms_QParser.patch

Here it is, with test.
>From the javadoc:

bq. Finds documents whose specified field has any of the specified values. It's 
like TermQParserPlugin but multi-valued, and supports a variety of internal 
algorithms. Parameters: f: The field name (mandatory) separator: the separator 
delimiting the values in the query string. By default it's a " " which is 
special in that it splits on any consecutive whitespace. method: Any of 
termsFilter (default), booleanQuery, automaton, docValuesTermsFilter. Note that 
if no values are specified then the query matches no documents.

It would be cool if somebody did some benchmarking that would allow us to 
choose between some of the algorithms based on heuristics... but this is fine 
for now.  For example use method=X when the number of values is > some value.  
And use docValuesTermsFilter if docValues is enabled.  Note that 
DocValuesTermsFilter (trunk) is known as FieldCacheTermsFilter on 4x.  On 4x 
this feature doesn't support DocValues (just FieldCache) whereas on trunk it 
supports both depending on wether you indexed DocValues or not (I think).  That 
method is also limited to single valued fields, but there's no explicit check.

I'll commit this in a couple days, pending input.

> QParser for TermsFilter
> -----------------------
>
>                 Key: SOLR-6318
>                 URL: https://issues.apache.org/jira/browse/SOLR-6318
>             Project: Solr
>          Issue Type: New Feature
>          Components: query parsers
>            Reporter: David Smiley
>            Assignee: David Smiley
>             Fix For: 4.10
>
>         Attachments: SOLR-6318__terms_QParser.patch
>
>
> Some applications require filtering documents by a large number of terms.  
> It's often related to security filtering.  Naively this is done this way:
> {noformat}
>     fq={!df=myfield q.op=OR}code1 code2 code3 code4 code5...
> {noformat}
> And this ends up being a BooleanQuery.  Users then wind up hitting 
> BooleaQuery.maxClauseCount (sometimes in production, sadly) and they up it to 
> a huge number to get the job done.
> Solr should offer a QParser based on TermsFilter.  I propose it be named 
> "terms" (plural of term), and have a "separator" option defaulting to a 
> space.  When it's a space, the values also get trimmed, which wouldn't 
> otherwise happen.  The analysis logic should be the same as that for "term" 
> QParser which is to call FieldType.readableToIndexed.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (SOLR-6318) QParser for TermsFilter

Reply via email to