Coll/ml does disqualify itself if processes are not bound. The problem here is there is an inconsistency between the two sides of the intercommunicator. I can write a quick fix for 1.8.2.
-Nathan ________________________________________ From: devel [devel-boun...@open-mpi.org] on behalf of Gilles Gouaillardet [gilles.gouaillar...@gmail.com] Sent: Thursday, June 05, 2014 1:20 AM To: Open MPI Developers Subject: [OMPI devel] MPI_Comm_spawn affinity and coll/ml Folks, on my single socket four cores VM (no batch manager), i am running the intercomm_create test from the ibm test suite. mpirun -np 1 ./intercomm_create => OK mpirun -np 2 ./intercomm_create => HANG :-( mpirun -np 2 --mca coll ^ml ./intercomm_create => OK basically, this first two tasks will call twice MPI_Comm_spawn(2 tasks) followed by MPI_Intercomm_merge and the 4 spawned tasks will call MPI_Intercomm_merge followed by MPI_Intercomm_create i digged a bit into that issue and found two distinct issues : 1) binding : tasks [0-1] (launched with mpirun) are bound on cores [0-1] => OK tasks[2-3] (first spawn) are bound on cores [0-1] => ODD, i would have expected [2-3] tasks[4-5] (second spawn) are not bound at all => ODD again, could have made sense if tasks[2-3] were bound on cores [2-3] i observe the same behaviour with the --oversubscribe mpirun parameter 2) coll/ml coll/ml hangs when -np 2 (total 6 tasks, including 2 unbound tasks) i suspect coll/ml is unable to handle unbound tasks. if i am correct, should coll/ml detect this and simply automatically disqualify itself ? Cheers, Gilles