Re: java on 64 bits
Hello everyone! Here are the conclusions we got after digging more into the problem, maybe they help someone: 1) Filling of the hard-drive was not due to java 64, this was coincidentally. 2) The intermediate files Yonik talked about (*.f*) were present because the indexing process was merging very large segments, which took a while to be merged. 3) We are indexing a continous stream of data. As documents get out-of-date they are deleted from the index. In order to ensure data throughput we use a batch indexing strategy by setting mergeFactor to 50, but never optimizing. The downside of this is that it will take a long time before we reach the point where deleted documents are purged when out-of-date segments are merged. This means we end up with large segments that contain nothing but deleted documents that could be deleted if they weren't included in the segments file. 4) Assuming that frequently merging into a large segment doesn't affect the data throughput, then we should probably have implemented the strategy as described by Doug Cutting here - scroll down: http://www.gossamer-threads.com/lists/lucene/java-user/29350?page=last Hth, casper roxana Thanks everyone for the answers! I'm experimenting with your suggestions, I will let you know if something interesting pops up. roxana 1) make sure the failure was due to an OutOfMemory exception and not something else. 2) if you have enough memory, increase the max JVM heap size (-Xmx) 3) if you don't need more than 1.5G or so of heap, use the 32 bit JVM instead (depending on architecture, it can acutally be a little faster because more references fit in the CPU cache). 4) see how many indexed fields you have and if you can consolidate any of them 4.5) if you don't have too many indexed fields, and have enough spare file descriptors, try using the non-compound file format instead. 5) run with the latest version of lucene (1.9 dev version) which may have better memory usage during optimizes segment merges. 6) If/when optional norms http://issues.apache.org/jira/browse/LUCENE-448 makes it into lucene, you can apply it to any indexed fields for which you don't need index-time boosting or length normalization. As for getting rid of your current intermediate files, I'd rebuild from scratch just to ensure things are OK. -Yonik Now hiring -- http://tinyurl.com/7m67g On 10/21/05, Roxana Angheluta [EMAIL PROTECTED] wrote: Thank you, Yonik, it seems this is the case. What can we do in this case? Would running the program with java -d32 be a solution? Thanks again, roxana One possibility: if lucene runs out of memory while adding or optimizing, it can leave unused files beind that increase the size of the index. A 64 bit JVM will require more memory than a 32 bit one due to the size of all references being doubled. If you are using the compound file format (the default - check for .cfs files), then it's easy to check if you have this problem by seeing if there are any *.f* files in the index directory. These are intermediate files and shouldn't exist for long in a compound-file index. -Yonik Now hiring -- http://tinyurl.com/7m67g - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: java on 64 bits
Thank you, Yonik, it seems this is the case. What can we do in this case? Would running the program with java -d32 be a solution? Thanks again, roxana One possibility: if lucene runs out of memory while adding or optimizing, it can leave unused files beind that increase the size of the index. A 64 bit JVM will require more memory than a 32 bit one due to the size of all references being doubled. If you are using the compound file format (the default - check for .cfs files), then it's easy to check if you have this problem by seeing if there are any *.f* files in the index directory. These are intermediate files and shouldn't exist for long in a compound-file index. -Yonik Now hiring -- http://tinyurl.com/7m67g On 10/20/05, Roxana Angheluta [EMAIL PROTECTED] wrote: Hi everybody! We have a large Lucene index which gets updated very often. Until recently the java virtual machine used to manage the index was on 32 bits, although the program was running on a 64bits station. Last week we changed the java to 64 bits and since then we experience strange problems, the index grows very large. I'm not sure the 2 are related, that's why I ask here: is it possible that the index got corrupted after we updated the jvm? Is there any relation between the size of the index and the jvm used? I hope the questions make sense, thanks, roxana - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: java on 64 bits
You can also try to clean up index with Luke. Yonik Seeley wrote: One possibility: if lucene runs out of memory while adding or optimizing, it can leave unused files beind that increase the size of the index. A 64 bit JVM will require more memory than a 32 bit one due to the size of all references being doubled. If you are using the compound file format (the default - check for .cfs files), then it's easy to check if you have this problem by seeing if there are any *.f* files in the index directory. These are intermediate files and shouldn't exist for long in a compound-file index. -Yonik Now hiring -- http://tinyurl.com/7m67g On 10/20/05, Roxana Angheluta [EMAIL PROTECTED] wrote: Hi everybody! We have a large Lucene index which gets updated very often. Until recently the java virtual machine used to manage the index was on 32 bits, although the program was running on a 64bits station. Last week we changed the java to 64 bits and since then we experience strange problems, the index grows very large. I'm not sure the 2 are related, that's why I ask here: is it possible that the index got corrupted after we updated the jvm? Is there any relation between the size of the index and the jvm used? I hope the questions make sense, thanks, roxana - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- regards, Volodymyr Bychkoviak - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: java on 64 bits
1) make sure the failure was due to an OutOfMemory exception and not something else. 2) if you have enough memory, increase the max JVM heap size (-Xmx) 3) if you don't need more than 1.5G or so of heap, use the 32 bit JVM instead (depending on architecture, it can acutally be a little faster because more references fit in the CPU cache). 4) see how many indexed fields you have and if you can consolidate any of them 4.5) if you don't have too many indexed fields, and have enough spare file descriptors, try using the non-compound file format instead. 5) run with the latest version of lucene (1.9 dev version) which may have better memory usage during optimizes segment merges. 6) If/when optional norms http://issues.apache.org/jira/browse/LUCENE-448 makes it into lucene, you can apply it to any indexed fields for which you don't need index-time boosting or length normalization. As for getting rid of your current intermediate files, I'd rebuild from scratch just to ensure things are OK. -Yonik Now hiring -- http://tinyurl.com/7m67g On 10/21/05, Roxana Angheluta [EMAIL PROTECTED] wrote: Thank you, Yonik, it seems this is the case. What can we do in this case? Would running the program with java -d32 be a solution? Thanks again, roxana One possibility: if lucene runs out of memory while adding or optimizing, it can leave unused files beind that increase the size of the index. A 64 bit JVM will require more memory than a 32 bit one due to the size of all references being doubled. If you are using the compound file format (the default - check for .cfs files), then it's easy to check if you have this problem by seeing if there are any *.f* files in the index directory. These are intermediate files and shouldn't exist for long in a compound-file index. -Yonik Now hiring -- http://tinyurl.com/7m67g
RE: java on 64 bits
I have seen quite a few posts on using the 1.9 dev version for production uses. How stable is it? Is it really ready for production? I would like to use it.. but I never ever put beta packages in procution.. but then again.. I'm always dealing with Microsoft :) Tom -Original Message- From: Yonik Seeley [mailto:[EMAIL PROTECTED] Sent: Friday, October 21, 2005 9:28 AM To: java-user@lucene.apache.org Subject: Re: java on 64 bits 1) make sure the failure was due to an OutOfMemory exception and not something else. 2) if you have enough memory, increase the max JVM heap size (-Xmx) 3) if you don't need more than 1.5G or so of heap, use the 32 bit JVM instead (depending on architecture, it can acutally be a little faster because more references fit in the CPU cache). 4) see how many indexed fields you have and if you can consolidate any of them 4.5) if you don't have too many indexed fields, and have enough spare file descriptors, try using the non-compound file format instead. 5) run with the latest version of lucene (1.9 dev version) which may have better memory usage during optimizes segment merges. 6) If/when optional norms http://issues.apache.org/jira/browse/LUCENE-448 makes it into lucene, you can apply it to any indexed fields for which you don't need index-time boosting or length normalization. As for getting rid of your current intermediate files, I'd rebuild from scratch just to ensure things are OK. -Yonik Now hiring -- http://tinyurl.com/7m67g On 10/21/05, Roxana Angheluta [EMAIL PROTECTED] wrote: Thank you, Yonik, it seems this is the case. What can we do in this case? Would running the program with java -d32 be a solution? Thanks again, roxana One possibility: if lucene runs out of memory while adding or optimizing, it can leave unused files beind that increase the size of the index. A 64 bit JVM will require more memory than a 32 bit one due to the size of all references being doubled. If you are using the compound file format (the default - check for .cfs files), then it's easy to check if you have this problem by seeing if there are any *.f* files in the index directory. These are intermediate files and shouldn't exist for long in a compound-file index. -Yonik Now hiring -- http://tinyurl.com/7m67g - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: java on 64 bits
Hi, Also, I think you may try to increase the indexInterval, it is set to 128, but getting it larger, the .tii files will be smaller. Since .tii files are loaded into memory as a whole, so, your memory usage might be smaller. However, this change might affect your search speed. So, be careful about the value you want to set, not too high though. Just my thoughts, hope helps. Jian On 10/21/05, Aigner, Thomas [EMAIL PROTECTED] wrote: I have seen quite a few posts on using the 1.9 dev version for production uses. How stable is it? Is it really ready for production? I would like to use it.. but I never ever put beta packages in procution.. but then again.. I'm always dealing with Microsoft :) Tom -Original Message- From: Yonik Seeley [mailto:[EMAIL PROTECTED] Sent: Friday, October 21, 2005 9:28 AM To: java-user@lucene.apache.org Subject: Re: java on 64 bits 1) make sure the failure was due to an OutOfMemory exception and not something else. 2) if you have enough memory, increase the max JVM heap size (-Xmx) 3) if you don't need more than 1.5G or so of heap, use the 32 bit JVM instead (depending on architecture, it can acutally be a little faster because more references fit in the CPU cache). 4) see how many indexed fields you have and if you can consolidate any of them 4.5) if you don't have too many indexed fields, and have enough spare file descriptors, try using the non-compound file format instead. 5) run with the latest version of lucene (1.9 dev version) which may have better memory usage during optimizes segment merges. 6) If/when optional norms http://issues.apache.org/jira/browse/LUCENE-448 makes it into lucene, you can apply it to any indexed fields for which you don't need index-time boosting or length normalization. As for getting rid of your current intermediate files, I'd rebuild from scratch just to ensure things are OK. -Yonik Now hiring -- http://tinyurl.com/7m67g On 10/21/05, Roxana Angheluta [EMAIL PROTECTED] wrote: Thank you, Yonik, it seems this is the case. What can we do in this case? Would running the program with java -d32 be a solution? Thanks again, roxana One possibility: if lucene runs out of memory while adding or optimizing, it can leave unused files beind that increase the size of the index. A 64 bit JVM will require more memory than a 32 bit one due to the size of all references being doubled. If you are using the compound file format (the default - check for .cfs files), then it's easy to check if you have this problem by seeing if there are any *.f* files in the index directory. These are intermediate files and shouldn't exist for long in a compound-file index. -Yonik Now hiring -- http://tinyurl.com/7m67g - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
java on 64 bits
Hi everybody! We have a large Lucene index which gets updated very often. Until recently the java virtual machine used to manage the index was on 32 bits, although the program was running on a 64bits station. Last week we changed the java to 64 bits and since then we experience strange problems, the index grows very large. I'm not sure the 2 are related, that's why I ask here: is it possible that the index got corrupted after we updated the jvm? Is there any relation between the size of the index and the jvm used? I hope the questions make sense, thanks, roxana - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]