Folks,

the intercomm_create test case from the ibm test suite can hang under
some configuration.

basically, it will spawn n tasks in a first communicator, and then n
tasks in a second communicator.

when i run from node0 :
mpirun -np 1 --mca btl tcp,self --mca coll ^ml -host node1,node2
./intercomm_create

the second spawn will hang.
a simple workaround is to use 3 hosts :
mpirun -np 1 --mca btl tcp,self --mca coll ^ml -host node1,node2,node3
./intercomm_create

the second spawn creates the task on node2.
for some reasons i cannot fully understand, pmix believe orted of nodes
node1 and node2 are involved in allgather.
since node1 in not involved whatsoever, the program hangs
/* in create_dmns, orte_get_job_data_object(sig->signature[0].jobid)
returns jdata with jdata->map->num_nodes = 2 */

Cheers,

Gilles

Reply via email to