Morus Walter wrote:
Searches must be able on any combination of collections.
A typical search includes ~ 40 collections.

Now the question is, how to implement this in lucene best.

Currently I see basically three possibilities:
- create a data field containing the collection name for each document
  and extend the query by a or-combined list of queries on this name filed.

Are lots of different combinations of collections used frequently? Probably not. If only a handful of different subsets of collections are frequently searched, then QueryFilter could be very useful.


In this approach you construct a QueryFilter for each combination of collections, passing it the collection name query. Keep the query filter around and re-use it whenever a query with that combination of collections is made. This is very fast. It uses one bit per document per filter. So if you have a million documents and eight common combinations of collections then this would use one megabyte.

You could also keep a cache of QueryFilters in a LinkedHashMap (JDK 1.4). If the size of the cache exceeds a limit, throw away its eldest entry by overriding the removeEldestEntry method. That way, if any combination of collections is possible, but only a few are probable, you can just cache the common subsets as QueryFilters. Probably we should provide such a QueryFilterCache class with Lucene...

This is the approach that I would use.

Doug


--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to