Hi,
we are currently evaluating lucene.
The data we'd like to index consists of ~ 80 collections of documents
(a few hundred up to 200000 documents per collection, ~ 1.5 million documents
total; medium document size is in the order of 1 kB).
Searches must be able on any combination of collections.
A typical search includes ~ 40 collections.
Now the question is, how to implement this in lucene best.
Currently I see basically three possibilities:
- create a data field containing the collection name for each document
and extend the query by a or-combined list of queries on this name filed.
- create an index per collection and use a MultiSearcher to search all
interesting indexes.
- (a third on I just discovered): create a data field containing a
marker for each collection
x100000000000000000... for the first collection
x010000000000000000... for the second
x001000000000000000... for the third
and so on.
The query might use a wildcard search on this field using x?0?00000...
specifying '?' for each collection that should be searched on, and '0'
for the others.
The marker would be very long though (the number of collections is
growing, so we have to keep space for new one also).
So far we set up the first aproach (one index; size ~ 750 M) and this
seems to work in principle and with reasonable performance.
I'm not too optimistic about the second aproach. If I understand the docs
correctly this would be a sequential search on each involved index and
combining the results.
So questions:
- has anyone experience with such a setup?
- are there other aproaches to deal with it?
- is my expectation, that multiple indexes are worse reasonable or should
we give it a try?
- how is wildcard search done? Could this be an improvement?
I understand that in the end, we have to check this ourselfs, but I'd
appreciate any hints and advices since I couln'd find much on this
issue in the docs.
greetings
Morus
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]