#PBS -l nodes=4:ppn=4 will request four nodes with four processors per node. #PBS -l nodes=4:ppn=1 will request four nodes with one processor per node. the MPI problem is a separate issue... --Joe
________________________________ From: [EMAIL PROTECTED] on behalf of Mary Ellen Fitzpatrick Sent: Fri 10/31/2008 11:45 AM To: [email protected]; Mary Ellen Fitzpatrick Subject: [Mauiusers] mpi job on multi-core nodes,fails to run on multiple nodes Hi, Trying to figure out if this is an maui or mpi issue. I have 48 (dual-dual core cpus) linux cluster. I have torque-2.3.3, maui-3.2.6p19, mpich2-1.07 installed. Not sure if I have maui configured correctly. What I want to do is submit an mpi job that runs one process/per node requests all 4 cores on the node and I want to submit this one process to 4 nodes. If I request in my pbs script 1 node with 4 processors, then it works fine: #PBS -l nodes=1:ppn=4, everything runs on one node 4 cpus, mpi output says everything ran perfect. If I request in my pbs script 4 nodes with 4 processors then it fails: #PBS -l nodes=4:ppn=4, my epilogue/proloque output file say the job ran on 4 nodes and requests 16 processors. But my mpi output file says it crashed: --snippet-- Initializing MPI Routines... Initializing MPI Routines... Initializing MPI Routines... Initializing MPI Routines... rank 15 in job 29 node1047_40014 caused collective abort of all ranks exit status of rank 15: killed by signal 9 rank 13 in job 29 node1047_40014 caused collective abort of all ranks exit status of rank 13: killed by signal 9 rank 12 in job 29 node1047_40014 caused collective abort of all ranks exit status of rank 12: return code 0 --snippet-- Maui.cfg pertinent info: JOBPRIOACCRUALPOLOCY ALWAYS # accrue priority as soon as job is submitted JOBNODEMATCHPOLICY EXACTNODE NODEALLOCATIONPOLICY MINRESOURCE NODEACCESSPOLICY SHARED /var/spool/torque/server_priv/nodes file node1048 np=4 etc torque queue info: set queue spartans queue_type = Execution set queue spartans resources_default.neednodes = spartans set queue spartans resources_default.nodes = 1 set queue spartans enabled = True set queue spartans started = True Anyone know why my mpi job is crashing? Or if this is an maui/torque or mpi issue? -- Thanks Mary Ellen _______________________________________________ mauiusers mailing list [email protected] http://www.supercluster.org/mailman/listinfo/mauiusers
_______________________________________________ mauiusers mailing list [email protected] http://www.supercluster.org/mailman/listinfo/mauiusers
