Re: Biggest index size/document in Lucene

Doug Cutting Tue, 04 Nov 2003 09:22:17 -0800

There was a bug (recently fixed) when creating indexes with over a couple hundred million documents. So you should use 1.3 RC2, which has a fix for this bug.

The biggest indexes I've personally created have around 30M documents. I maintain these as a set of separately updated indexes, then merge them together into a single big index for deployment. I find this easier than trying to maintain a single massive index.

My guess is that your search times won't be too fast, probably on the order of a few seconds (more than one, less than ten). It will be disk bound. You could improve performance by distributing search over multiple machines, each searching a smaller index, a subset of the entire data.

Doug

Victor Hadianto wrote:

Hi all,

I'm interested to know have big of Lucene index/documents that you have
experienced with? We are trying to index in the mark of 300 million text
documents. Each document will be quite small around 10kb ish.

Any insight about the scalability of Lucene with this many documents?
Creating the index and searching?

thanks,

/victor


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Biggest index size/document in Lucene

Reply via email to