Hi, I am using lucene -1.3 final version.
Let us say there are 10,000 files with size of 20 MB. So, total file system size = 10,000 * 20 MB = 200 GB. I want to index these files. Let us say, the merge factor = 10 Min heap size required by JVM = 10 * 20 = 200 MB >From the http://www.onjava.com/pub/a/onjava/2003/03/05/lucene.html article, ------------------------------------------------------------------------ --- For instance, if we set mergeFactor to 10, a new segment will be created on the disk for every 10 documents added to the index. When the 10th segment of size 10 is added, all 10 will be merged into a single segment of size 100. When 10 such segments of size 100 have been added, they will be merged into a single segment containing 1000 documents, and so on. Therefore, at any time, there will be no more than 9 segments in each power of 10 index size. ------------------------------------------------------------------------ ----- If the lucene is indexing the 1000th document, then the current time segment size would be 100. At that time, how many documents would the lucene hold in memory (10 documents or 100 documents)? If the lucene holds 100 documents, then min heap memory required will be 100 * 20 = 2 GB, which is unlikely. Is the optimize process memory intensive? How much memory lucene would take while doing the optimize? Is it safe to assume that the maximum heap size required by lucene is mergefactor * maximum_file_size? I am planning to use the default maxMergeDocs as default, as stated in the article. Any help on the above questions is highly appreciated. Thanks, -Sreedhar --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
