> Author documents also have other attributes, for example "Weight". I want > a > query that gives every book document authored by people weighing more than > 200lbs, with the ability of doing faceting and the likes.
For that you should use a RangeQuery, as it is a numeric value, the new NumericRangeQuery in the no yet released 2.9 can do this very fast. You must only index the weight using NumericFile/NumericTokenStream. > Grant Ingersoll-6 wrote: > > > > This strikes me as an example of > > http://people.apache.org/~hossman/#xyproblem > > Namely, you've declared the solution you would like, but haven't > > told us the problem. > > > > I highly doubt that double loop is going to scale. It wouldn't scale > > in a database, either, so it makes me think we need to take a step > > back and ask a bit more about the problem you are trying to solve and > > not the solution. Can you share more details about it? > > > > On Jul 26, 2009, at 6:14 PM, Edoardo Marcora wrote: > > > >> > >> type:foo and type:bar are fields used to represent documents of > >> different > >> "kind" (it could be "author" and "book"). field2 and field1 contains > >> IDs > >> which I would like to use to join the two "kinds". > >> > >> > >> Ken Krugler wrote: > >>> > >>>> awarnier wrote: > >>>>> > >>>>> Edoardo Marcora wrote: > >>>>>> I am faced with the requirement for a boolean query composed of > >>>>>> 50,000 > >>>>>> clauses (all of them directed at the same field) all OR'ed > >>>>>> together. > >>>>>> > >>>>> By pure intellectual curiosity : can you provide some idea of the > >>>>> type > >>>>> of query, and the type of content of the field this is targeted > >>>>> at ? > >>>>> I have this notion that with 50,000 queries directed at one field, > >>>>> there > >>>>> must be some smarter way of handling this than just OR-ing > >>>>> together the > >>>>> results. > >>>>> > >>>>> > >>>> > >>>> What I would like to do is to take the results of one query and > >>>> use one of > >>>> its fields (not the docid) as an argument to another query (much > >>>> like a > >>>> subquery in SQL). For example: > >>>> > >>>> type:foo AND (_query_:type:bar AND field2:{field1}) > >>>> > >>>> This should search for all types of foo and then iterate over the > >>>> result > >> set > >>>> and perform a query for where type is bar and field2 is equal to > >>>> the value > >>>> of field1 from each item of the first result set. > >>> > >>> This looks like a more like this (MLT) query, where you restrict the > >>> set to documents that have matching types...though I don't understand > >>> the type:foo AND type:bar query, unless 'type' is a multi-value > >>> field. > >>> > >>> From what I remember of using MLT support in Lucene a few years back, > >>> this takes the terms of the target field from the target document, > >>> tosses out stop words, and then uses some arbitrary limit (e.g. 500) > >>> for the first N terms used to do the query. > >>> > >>> Unless the distribution of terms in the field is heavily skewed, this > >>> gives you pretty good results. I supposed you could use the N most > >>> common terms - but your stop word list isn't good, you'll get worse > >>> results. > >>> > >>> In any case, preprocessing the field will speed things up, versus > >>> doing any analysis/stop word/frequency calculations at query time. > >>> > >>> -- Ken > >>> -- > >>> Ken Krugler > >>> <http://ken-blog.krugler.org> > >>> +1 530-265-2225 > >>> > >> > >> -- > >> View this message in context: > >> http://www.nabble.com/Boolean-query-with-50%2C000-clauses%21-Possible-- > Scalable--tp24664839p24671050.html > >> Sent from the Lucene - General mailing list archive at Nabble.com. > >> > > > > -------------------------- > > Grant Ingersoll > > http://www.lucidimagination.com/ > > > > Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) > > using Solr/Lucene: > > http://www.lucidimagination.com/search > > > > > > > > -- > View this message in context: http://www.nabble.com/Boolean-query-with- > 50%2C000-clauses%21-Possible--Scalable--tp24664839p24701672.html > Sent from the Lucene - General mailing list archive at Nabble.com.
