Erik Hatcher wrote:
One interesting option is to subclass QueryParser and override
getFieldQuery. When the field is "tag", return a FilteredQuery (see
trunk codebase, or the nightly 1.9 binaries) using a Filter that
interfaces with your database. Caching of the filters would be
desirable for performance reasons.
Aha. That does sound like it could work, although it will be an
interesting exercise in trickery.
I'm not sure it would entirely work at the getFieldQuery level, perhaps
at the getBooleanQuery level. The reasoning is this...
text:camel AND tag:zoo
This needs to become a single FilteredQuery with a TermQuery
(text:camel) and a TagFilter (tag:zoo).
text:camel OR tag:zoo
This needs to become a BooleanQuery. The TermQuery (text:camel)
would optional, and the other query would be a FilteredQuery which
filters an MatchAllDocsQuery with a TagFilter (tag:zoo).
text:camel NOT tag:zoo
This would be a FilteredQuery with a TermQuery (text:camel) and
a NotFilter over a TagFilter(tag:zoo).
It's complicated, but it seems like it would work. The only cases which
become really hard are cases where there are multiple non-text-index
queries in there. Then I might have to use an AndFilter or similar.
And in cases where there are only non-text-index queries in there I
would have to automatically insert a MatchAllDocsQuery.
In the latest codebase, there is a MatchAllDocsQuery that can be used in
this case. I also have implemented this sort of thing with a custom
query parser for a client.
This sounds interesting in itself. I was trying to write one of these
myself, not realising that it had been added into source control
recently. My plan was to get the query returning all docs and then
figure out how to abstract it so that it could filter the returned docs
down on the fly.
I may yet be able to use MatchAllDocsQuery as a means for doing this, as
it will contain a lot of the framework code which I was finding it hard
to write myself (having to write a Query, Weight and Scorer class is
something I wanted to try and abstract away from our own custom ones.)
My main motivation for wanting to use a "real" query as opposed to a
FilteredQuery is that filters cost more up-front, and if you cache them
then they start costing in memory (our indexes are huge, therefore they
cost a LOT of memory.) Real queries are more or less a BitSet
implemented as an iterator, which is far preferable for us.
Daniel
--
Daniel Noll
Nuix Australia Pty Ltd
Suite 79, 89 Jones St, Ultimo NSW 2007, Australia
Phone: (02) 9280 0699
Fax: (02) 9212 6902
This message is intended only for the named recipient. If you are not
the intended recipient you are notified that disclosing, copying,
distributing or taking any action in reliance on the contents of this
message or attachment is strictly prohibited.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]