I've personally indexed over 1,000,000 documents and Lucene doesn't even breath hard.
We are in the hundreds of millions and growing, and Lucene does tend to sweat a little bit, although it can certainly handle it. You're going to have to understand a bit of the internals of Lucene a bit more. For example, we've had some serious bottlenecks when it comes to sorting. Comments like "sorting with strings takes more memory" really compounds when you have 4 million search results to sort! You'll definitely want to use multisearchers and partition your indexes *intelligently* according to your business logic. You might even run into a scenario where you need multiple copies of your index, each partitioned in a different way depending on the use case. Finally, be prepared for indexing to take a looong time. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]