Armbrust, Daniel C. wrote:
What is the file handle limit on XP?While I was trying to build this index, the biggest limitation of Lucene that I ran into was optimization. Optimization kills the indexers performance when you get between 3-5 million documents in an index. On my Windows XP box, I had to reoptimize every 100,000 documents to keep from running out of file handles.
When batch indexing, optimizing before the end slows things down, and should not be required.
Are you otherwise opening index readers in the same process? Index readers use a lot more file handles than the index writer, since they must keep all files in all segments open. For large indexes it's best to do your indexing in a separate process which never opens an IndexReader.
The max a reader will keep open is:
mergeFactor * log_base_mergeFactor(N) * files_per_segment
With mergeFactor=10 (the default) and 1 million documents, and 10 files per segment, a reader on a never-optimized index should at most require 600 open files, and typically half that.
A writer will open:
(1 + mergeFactor) * files_per_segment
With mergeFactor=10 (the default) and 1 million documents, a writer on a never-optimized index would require 110 open files.
I just built a 3M document index on Linux in five hours, with no intermediate optimizations. I set the mergeFactor to 50. This required around 500 file handles, well beneath the 1024 limit.
Doug
--
To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>
