There was a bug (recently fixed) when creating indexes with over a couple hundred million documents. So you should use 1.3 RC2, which has a fix for this bug.

The biggest indexes I've personally created have around 30M documents. I maintain these as a set of separately updated indexes, then merge them together into a single big index for deployment. I find this easier than trying to maintain a single massive index.

My guess is that your search times won't be too fast, probably on the order of a few seconds (more than one, less than ten). It will be disk bound. You could improve performance by distributing search over multiple machines, each searching a smaller index, a subset of the entire data.

Doug

Victor Hadianto wrote:
Hi all,

I'm interested to know have big of Lucene index/documents that you have
experienced with? We are trying to index in the mark of 300 million text
documents. Each document will be quite small around 10kb ish.

Any insight about the scalability of Lucene with this many documents?
Creating the index and searching?

thanks,

/victor


--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to