Re: Queries not derived from the text index

Daniel Noll Wed, 08 Feb 2006 15:48:01 -0800

Erik Hatcher wrote:

One interesting option is to subclass QueryParser and overridegetFieldQuery. When the field is "tag", return a FilteredQuery (seetrunk codebase, or the nightly 1.9 binaries) using a Filter thatinterfaces with your database. Caching of the filters would bedesirable for performance reasons.

Aha. That does sound like it could work, although it will be aninteresting exercise in trickery.

I'm not sure it would entirely work at the getFieldQuery level, perhapsat the getBooleanQuery level. The reasoning is this...


  text:camel AND tag:zoo

    This needs to become a single FilteredQuery with a TermQuery
    (text:camel) and a TagFilter (tag:zoo).

  text:camel OR tag:zoo

    This needs to become a BooleanQuery.  The TermQuery (text:camel)
    would optional, and the other query would be a FilteredQuery which
    filters an MatchAllDocsQuery with a TagFilter (tag:zoo).

  text:camel NOT tag:zoo

    This would be a FilteredQuery with a TermQuery (text:camel) and
    a NotFilter over a TagFilter(tag:zoo).

It's complicated, but it seems like it would work. The only cases whichbecome really hard are cases where there are multiple non-text-indexqueries in there. Then I might have to use an AndFilter or similar.And in cases where there are only non-text-index queries in there Iwould have to automatically insert a MatchAllDocsQuery.

In the latest codebase, there is a MatchAllDocsQuery that can be used inthis case. I also have implemented this sort of thing with a customquery parser for a client.

This sounds interesting in itself. I was trying to write one of thesemyself, not realising that it had been added into source controlrecently. My plan was to get the query returning all docs and thenfigure out how to abstract it so that it could filter the returned docsdown on the fly.

I may yet be able to use MatchAllDocsQuery as a means for doing this, asit will contain a lot of the framework code which I was finding it hardto write myself (having to write a Query, Weight and Scorer class issomething I wanted to try and abstract away from our own custom ones.)

My main motivation for wanting to use a "real" query as opposed to aFilteredQuery is that filters cost more up-front, and if you cache themthen they start costing in memory (our indexes are huge, therefore theycost a LOT of memory.) Real queries are more or less a BitSetimplemented as an iterator, which is far preferable for us.


Daniel


--
Daniel Noll

Nuix Australia Pty Ltd
Suite 79, 89 Jones St, Ultimo NSW 2007, Australia
Phone: (02) 9280 0699
Fax:   (02) 9212 6902

This message is intended only for the named recipient. If you are not
the intended recipient you are notified that disclosing, copying,
distributing or taking any action in reliance on the contents of this
message or attachment is strictly prohibited.

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Queries not derived from the text index

Reply via email to