On 02/12/13 07:43 PM, Jeff Tan wrote:
>> In the tests I did on this platform, I found that in general
>> Ray needed between 400 MiB and 1 GiB of RAM per MPI rank.
>
> We must be doing something wrong then, because we were crashing out with
> debug messages indicating over 2 GB in some ranks at least:
>
> Rank 130: assembler memory usage: 2247572 KiB
> 2013-11-05 13:56:11.462 (FATAL) [0xfffa41b8bb0]
> 844455:ibm.runjob.client.Job: terminated due to: killing the job timed
> out
> 2013-11-05 13:56:11.463 (FATAL) [0xfffa41b8bb0]
> 844455:ibm.runjob.client.Job: abnormal termination by signal 11 from
> rank 386. Delivering SIGKILL with status RUNNING timed out after 60
> seconds
> 2013-11-05 13:56:11.464 (FATAL) [0xfffa41b8bb0]
> 844455:ibm.runjob.client.Job: 1 RAS event
> 2013-11-05 13:56:11.464 (FATAL) [0xfffa41b8bb0]
> 844455:ibm.runjob.client.Job: most recent RAS event text: killing job
> 844455 timed out after 60 seconds. 256 nodes are now unavailable.
>
> Is there something in the attached configuration file that is allowing
> such excess in memory utilization? We would appreciate any assistance
> as our experience with Ray is very limited at this point.
>
> Regards
>

How many rank are you using per Blue Gene node ?


Also, you may be interested by this:


A catalog of IBM Blue Gene/Q errors (for science)
http://dskernel.blogspot.ca/2013/02/a-catalog-of-ibm-blue-geneq-error-for.html

------------------------------------------------------------------------------
Sponsored by Intel(R) XDK 
Develop, test and display web and hybrid apps with a single code base.
Download it for free now!
http://pubads.g.doubleclick.net/gampad/clk?id=111408631&iu=/4140/ostg.clktrk
_______________________________________________
Denovoassembler-users mailing list
Denovoassembler-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/denovoassembler-users

Reply via email to