Hi,

we are currently evaluating lucene.

The data we'd like to index consists of ~ 80 collections of documents
(a few hundred up to 200000 documents per collection, ~ 1.5 million documents
total; medium document size is in the order of 1 kB).

Searches must be able on any combination of collections.
A typical search includes ~ 40 collections.

Now the question is, how to implement this in lucene best.

Currently I see basically three possibilities:
- create a data field containing the collection name for each document
  and extend the query by a or-combined list of queries on this name filed.
- create an index per collection and use a MultiSearcher to search all
  interesting indexes.
- (a third on I just discovered): create a data field containing a
  marker for each collection
  x100000000000000000... for the first collection
  x010000000000000000... for the second
  x001000000000000000... for the third
  and so on.
  The query might use a wildcard search on this field using x?0?00000...
  specifying '?' for each collection that should be searched on, and '0'
  for the others.
  The marker would be very long though (the number of collections is
  growing, so we have to keep space for new one also).

So far we set up the first aproach (one index; size ~ 750 M) and this 
seems to work in principle and with reasonable performance.
I'm not too optimistic about the second aproach. If I understand the docs
correctly this would be a sequential search on each involved index and
combining the results.

So questions:
- has anyone experience with such a setup?
- are there other aproaches to deal with it?
- is my expectation, that multiple indexes are worse reasonable or should
  we give it a try?
- how is wildcard search done? Could this be an improvement?

I understand that in the end, we have to check this ourselfs, but I'd
appreciate any hints and advices since I couln'd find much on this
issue in the docs.

greetings
        Morus

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to