Erik Hatcher wrote:
One interesting option is to subclass QueryParser and override getFieldQuery. When the field is "tag", return a FilteredQuery (see trunk codebase, or the nightly 1.9 binaries) using a Filter that interfaces with your database. Caching of the filters would be desirable for performance reasons.

Aha. That does sound like it could work, although it will be an interesting exercise in trickery.

I'm not sure it would entirely work at the getFieldQuery level, perhaps at the getBooleanQuery level. The reasoning is this...

  text:camel AND tag:zoo

    This needs to become a single FilteredQuery with a TermQuery
    (text:camel) and a TagFilter (tag:zoo).

  text:camel OR tag:zoo

    This needs to become a BooleanQuery.  The TermQuery (text:camel)
    would optional, and the other query would be a FilteredQuery which
    filters an MatchAllDocsQuery with a TagFilter (tag:zoo).

  text:camel NOT tag:zoo

    This would be a FilteredQuery with a TermQuery (text:camel) and
    a NotFilter over a TagFilter(tag:zoo).

It's complicated, but it seems like it would work. The only cases which become really hard are cases where there are multiple non-text-index queries in there. Then I might have to use an AndFilter or similar. And in cases where there are only non-text-index queries in there I would have to automatically insert a MatchAllDocsQuery.

In the latest codebase, there is a MatchAllDocsQuery that can be used in this case. I also have implemented this sort of thing with a custom query parser for a client.

This sounds interesting in itself. I was trying to write one of these myself, not realising that it had been added into source control recently. My plan was to get the query returning all docs and then figure out how to abstract it so that it could filter the returned docs down on the fly.

I may yet be able to use MatchAllDocsQuery as a means for doing this, as it will contain a lot of the framework code which I was finding it hard to write myself (having to write a Query, Weight and Scorer class is something I wanted to try and abstract away from our own custom ones.)

My main motivation for wanting to use a "real" query as opposed to a FilteredQuery is that filters cost more up-front, and if you cache them then they start costing in memory (our indexes are huge, therefore they cost a LOT of memory.) Real queries are more or less a BitSet implemented as an iterator, which is far preferable for us.

Daniel


--
Daniel Noll

Nuix Australia Pty Ltd
Suite 79, 89 Jones St, Ultimo NSW 2007, Australia
Phone: (02) 9280 0699
Fax:   (02) 9212 6902

This message is intended only for the named recipient. If you are not
the intended recipient you are notified that disclosing, copying,
distributing or taking any action in reliance on the contents of this
message or attachment is strictly prohibited.

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to