I am able to change the memory size parameters, so if I increase memory size (currently 2 gb) or add caches, it could be a solution?or is the program that is using too much memory?
thanks really for you input, I appreciate it. Sandra Guija From: [email protected] List-Post: [email protected] Date: Tue, 30 Oct 2012 11:50:28 -0700 To: [email protected] Subject: Re: [OMPI devel] process kill signal 59 Yeah, you're using too much memory for the shared memory system. Run with -mca btl ^sm on your cmd line - it'll run slower, but you probably don't have a choice. On Oct 30, 2012, at 11:38 AM, Sandra Guija <[email protected]> wrote:yes I think is related with my program too, when I run 1000x1000 matrix multiplication, the program works.when I run the 10,000 matrix only on one machine I got this:mca_common_sm_mmap_init: mmap failed with errno=12mca_mpool_sm_init: unable to shared memory mapping ( /tmp/openmpi-sessions-mpiu@tango_0/default-universe-1529/1/shared_mem_pool .tango)mca_common_sm_mmap_init: /tmp/openmpi-sessions-mpiu@tango_0/default-universe-1529/1/shared_mem_pool .tango failed with errno=2mca_mpool_sm_init: unable to shared memory mapping ( /tmp/openmpi-sessions-mpiu@tango_0/default-universe-1529/1/shared_mem_pool .tango)PML add procs failed-->Returned "0ut of resource" (-2) instead of " Success" (0) this is the result when I run free -m total used free shared buffers cachedMem: 2026 54 1972 0 6 25-/+ buffer cache: 22 511 Swap: 511 0 511 Sandra Guija From: [email protected] List-Post: [email protected] Date: Tue, 30 Oct 2012 10:33:02 -0700 To: [email protected] Subject: Re: [OMPI devel] process kill signal 59 Ummm...not sure what I can say about that with so little info. It looks like your process died for some reason that has nothing to do with us - a bug in your "magic10000" program? On Oct 30, 2012, at 10:24 AM, Sandra Guija <[email protected]> wrote:Hello, I am running a 10,000x10,000 matrix multiplication in 4 processors/1 core and I get the following error:mpirun -np 4 --hostfile nodes --bynode magic10000 mpirun noticed that job rank1 with PID 635 on node slave1 exited on signal 509(Real-time signal 25).2 additional process aborted (not shown)1 process killed (possibly by open MPI) node file contains:masterslave1slave2slave3 _______________________________________________ devel mailing list [email protected] http://www.open-mpi.org/mailman/listinfo.cgi/devel _______________________________________________ devel mailing list [email protected] http://www.open-mpi.org/mailman/listinfo.cgi/devel_______________________________________________ devel mailing list [email protected] http://www.open-mpi.org/mailman/listinfo.cgi/devel _______________________________________________ devel mailing list [email protected] http://www.open-mpi.org/mailman/listinfo.cgi/devel
