On 02/12/13 07:43 PM, Jeff Tan wrote: >> In the tests I did on this platform, I found that in general >> Ray needed between 400 MiB and 1 GiB of RAM per MPI rank. > > We must be doing something wrong then, because we were crashing out with > debug messages indicating over 2 GB in some ranks at least: > > Rank 130: assembler memory usage: 2247572 KiB > 2013-11-05 13:56:11.462 (FATAL) [0xfffa41b8bb0] > 844455:ibm.runjob.client.Job: terminated due to: killing the job timed > out > 2013-11-05 13:56:11.463 (FATAL) [0xfffa41b8bb0] > 844455:ibm.runjob.client.Job: abnormal termination by signal 11 from > rank 386. Delivering SIGKILL with status RUNNING timed out after 60 > seconds > 2013-11-05 13:56:11.464 (FATAL) [0xfffa41b8bb0] > 844455:ibm.runjob.client.Job: 1 RAS event > 2013-11-05 13:56:11.464 (FATAL) [0xfffa41b8bb0] > 844455:ibm.runjob.client.Job: most recent RAS event text: killing job > 844455 timed out after 60 seconds. 256 nodes are now unavailable. > > Is there something in the attached configuration file that is allowing > such excess in memory utilization? We would appreciate any assistance > as our experience with Ray is very limited at this point. > > Regards >
How many rank are you using per Blue Gene node ? Also, you may be interested by this: A catalog of IBM Blue Gene/Q errors (for science) http://dskernel.blogspot.ca/2013/02/a-catalog-of-ibm-blue-geneq-error-for.html ------------------------------------------------------------------------------ Sponsored by Intel(R) XDK Develop, test and display web and hybrid apps with a single code base. Download it for free now! http://pubads.g.doubleclick.net/gampad/clk?id=111408631&iu=/4140/ostg.clktrk _______________________________________________ Denovoassembler-users mailing list Denovoassembler-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/denovoassembler-users