Hi,
First, I am not sure if I should post my question here, since I am using CLucene (C++ port of Lucene) to build indexes. Hope someone here could help me.
I am indexing at a solaris machine with 1G memory. I use ram writer and fs writer, and write into fs index once a while. Now I am testing to index single input files. While testing on files < 50M, the program works well. While indexing bigger file, it runs out of 1G memory and crashes, whatever I set some parameters such as merge factor and the frequency writing to disk. My input files are in ASN.1 format, each with nested entries, and each entry with various number of fields. I index every outermost entry as a lucene document, and each data field as a lucene field. So what is different from others, the number of fields indexed in my program is quite big. Some files have more than 1000 different field names. There is no problem with max file descriptors. For those failed, some lucene documents have more than 40,000 field pairs (duplicate field names with different values). I think it is the reason why memory is consumed vastly. One of the failed cases is with an input file size: 66M, and crashes after processing about 3800 documents.
Is there any way to improve the program to use less memory? Any suggestion would be apprecited!
Regards, Yue Sun
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
