Re: Boolean query with 50,000 clauses! Possible? Scalable?

Ken Krugler Sun, 26 Jul 2009 14:19:03 -0700

awarnier wrote:


 Edoardo Marcora wrote:

 I am faced with the requirement for a boolean query composed of 50,000
 clauses (all of them directed at the same field) all OR'ed together.

 By pure intellectual curiosity : can you provide some idea of the type
 of query, and the type of content of the field this is targeted at ?
 I have this notion that with 50,000 queries directed at one field, there
 must be some smarter way of handling this than just OR-ing together the
 results.


What I would like to do is to take the results of one query and use one of
its fields (not the docid) as an argument to another query (much like a
subquery in SQL). For example:

type:foo AND (_query_:type:bar AND field2:{field1})

This should search for all types of foo and then iterate over the result set
and perform a query for where type is bar and field2 is equal to the value
of field1 from each item of the first result set.

This looks like a more like this (MLT) query, where you restrict theset to documents that have matching types...though I don't understandthe type:foo AND type:bar query, unless 'type' is a multi-value field.

From what I remember of using MLT support in Lucene a few years back,this takes the terms of the target field from the target document,tosses out stop words, and then uses some arbitrary limit (e.g. 500)for the first N terms used to do the query.

Unless the distribution of terms in the field is heavily skewed, thisgives you pretty good results. I supposed you could use the N mostcommon terms - but your stop word list isn't good, you'll get worseresults.

In any case, preprocessing the field will speed things up, versusdoing any analysis/stop word/frequency calculations at query time.


-- Ken
--
Ken Krugler
<http://ken-blog.krugler.org>
+1 530-265-2225

Re: Boolean query with 50,000 clauses! Possible? Scalable?

Reply via email to