Hello!

I am having problem using MPI_Comm_spawn under torque. It doesn't work when 
spawning more than 12 processes on various nodes. To be more precise, 
"sometimes" it works, and "sometimes" it doesn't!

Here is my case. I obtain 5 nodes, 3 cores per node and my $PBS_NODEFILE looks 
like below.

node1
node1
node1
node2
node2
node2
node3
node3
node3
node4
node4
node4
node5
node5
node5

I started a hello program (which just spawns itself and of course, the children 
don't spawn), with 

mpiexec -np 3 ./hello

Spawning 3 more processes (on node 2) - works!
spawning 6 more processes (node 2 and 3) - works!
spawning 9 processes (node 2,3,4) - "sometimes" OK, "sometimes" not!
spawning 12 processes (node 2,3,4,5) - "mostly" not!

I ideally want to spawn about 32 processes with large number of nodes, but this 
is at the moment impossible. I have attached my hello program to this email. 

I will be happy to provide any more info or verbose outputs if you could please 
tell me what exactly you would like to see.

Best,
Suraj

Attachment: hello.c
Description: Binary data

Reply via email to