> The best example that I've been able to find is the Yahoo research > lab - as I understand it, this is a Nutch (i.e. Lucene) > implementation that's providing impressive performance over a > 100 million document repository.
This demo runs on a handful of boxes. It was originally running on three dual-processor boxes, but I think Yahoo! subsequently moved it to six or eight single-processor boxes. Queries are broadcast to all servers, and the top-scoring matches overall are presented.
In Nutch-based benchmarks, we found that a single-processor box with 4GB of memory and a 2M page Nutch index (i.e., the entire index fits in RAM) could handle over 20 Nutch searches/second. A box with 1GB of memory and a 20M page Nutch index (i.e., the entire index does not fit in memory) could only handle around 1 or 2 Nutch searches/second. These were done with Lucene 1.3. Lucene 1.4 should be somewhat faster. Performance will obviously vary with processor speed, disk speed, average document size, average number terms per query, etc.
Doug
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
