Re: Queries not derived from the text index

Erik Hatcher Wed, 08 Feb 2006 16:25:41 -0800


On Feb 8, 2006, at 6:46 PM, Daniel Noll wrote:

Erik Hatcher wrote:
One interesting option is to subclass QueryParser and overridegetFieldQuery. When the field is "tag", return a FilteredQuery(see trunk codebase, or the nightly 1.9 binaries) using a Filterthat interfaces with your database. Caching of the filters wouldbe desirable for performance reasons.
Aha. That does sound like it could work, although it will be aninteresting exercise in trickery.
I'm not sure it would entirely work at the getFieldQuery level,perhaps at the getBooleanQuery level. The reasoning is this...
  text:camel AND tag:zoo

    This needs to become a single FilteredQuery with a TermQuery
    (text:camel) and a TagFilter (tag:zoo).

Actually I'm pretty certain that it'll work with just getFieldQueryoverriding. You can AND or OR a FilteredQuery with any other Queryinside a BooleanQuery. I'd be surprised if it didn't work. Scoringis the one tricky caveat to this sort of thing, and perhaps the new"function" capability would be the ticket to adjusting scores foryour non-Lucene "search".

  text:camel NOT tag:zoo

    This would be a FilteredQuery with a TermQuery (text:camel) and
    a NotFilter over a TagFilter(tag:zoo).
It's complicated, but it seems like it would work. The only caseswhich become really hard are cases where there are multiple non-text-index queries in there. Then I might have to use an AndFilteror similar. And in cases where there are only non-text-indexqueries in there I would have to automatically insert aMatchAllDocsQuery.

Maybe I haven't thought this through enough given your (quitedetailed and clear) descriptions of the scenario, but I still thinkjust letting getFieldQuery produce a FilteredQuery appropriately thatAND/OR/NOT will be handled the rest of the way as desired. Wellworth a try. Certainly a pure NOT query is the one case thatQueryParser and BooleanQuery don't currently like, but that is aneasy hack (and perhaps should be part of QueryParser anyway) to usethe MatchAllDocsQuery instead.

My main motivation for wanting to use a "real" query as opposed toa FilteredQuery is that filters cost more up-front, and if youcache them then they start costing in memory (our indexes are huge,therefore they cost a LOT of memory.) Real queries are more orless a BitSet implemented as an iterator, which is far preferablefor us.

I'm pulling of the same sort of stunts with the faceted search systemI've developed. The data has not currently reached the "huge" levelyet, but it is growing and memory will become more of a concern.There is a memory saving alternative BitSet-like implementationavailable in JIRA somewhere (sorry, no reference handy, but it'sthere and probably finable by a "BitSet" search). Perhaps that isworth consideration in your case. There is also discussion aboutchanging how Filters work to not use a BitSet directly but rather anenumeration-like interface such at TermEnum, etc.


        Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Queries not derived from the text index

Reply via email to