Hi, It looks like there is a problem in trunk which reproduces with simple_spawn test (orte/test/mpi/simple_spawn.c). It seems to be a n issue with pmix. It doesn't reproduce with default set of btls. But it reproduces with several btls specified. For example,
salloc -N5 $OMPI_HOME/install/bin/mpirun -np 33 --map-by node -mca coll ^ml -display-map -mca orte_debug_daemons true --leave-session-attached --debug-daemons -mca pml ob1 -mca btl *tcp,self* ./orte/test/mpi/simple_spawn gets simple_spawn: ../../ompi/group/group_init.c:215: ompi_group_increment_proc_count: Assertion `((0xdeafbeedULL << 32) + 0xdeafbeedULL) == ((opal_object_t *) (proc_pointer))->obj_magic_id' failed. [sputnik3.vbench.com:28888] [[41877,0],3] orted_cmd: exit cmd, but proc [[41877,1],2] is alive [sputnik5][[41877,1],29][../../../../../opal/mca/btl/tcp/btl_tcp_endpoint.c:675:mca_btl_tcp_endpoint_complete_connect] connect() to 192.168.1.42 failed: Connection refused (111) salloc -N1 $OMPI_HOME/install/bin/mpirun -np 3 --map-by node -mca coll ^ml -display-map -mca orte_debug_daemons true --leave-session-attached --debug-daemons -mca pml ob1 -mca btl *sm,self* ./orte/test/mpi/simple_spawn fails with At least one pair of MPI processes are unable to reach each other for MPI communications. This means that no Open MPI device has indicated that it can be used to communicate between these processes. This is an error; Open MPI requires that all MPI processes be able to reach each other. This error can sometimes be the result of forgetting to specify the "self" BTL. Process 1 ([[59481,2],0]) is on host: sputnik1 Process 2 ([[59481,1],0]) is on host: sputnik1 BTLs attempted: self sm Your MPI job is now going to abort; sorry. -------------------------------------------------------------------------- [sputnik1.vbench.com:22156] [[59481,1],2] ORTE_ERROR_LOG: Unreachable in file ../../../../../ompi/mca/dpm/orte/dpm_orte.c at line 485 salloc -N1 $OMPI_HOME/install/bin/mpirun -np 3 --map-by node -mca coll ^ml -display-map -mca orte_debug_daemons true --leave-session-attached --debug-daemons -mca pml ob1 -mca btl *openib,self* ./orte/test/mpi/simple_spawn also doesn't work: At least one pair of MPI processes are unable to reach each other for MPI communications. This means that no Open MPI device has indicated that it can be used to communicate between these processes. This is an error; Open MPI requires that all MPI processes be able to reach each other. This error can sometimes be the result of forgetting to specify the "self" BTL. Process 1 ([[60046,1],13]) is on host: sputnik4 Process 2 ([[60046,2],1]) is on host: sputnik4 BTLs attempted: openib self Your MPI job is now going to abort; sorry. -------------------------------------------------------------------------- [sputnik4.vbench.com:25476] [[60046,1],3] ORTE_ERROR_LOG: Unreachable in file ../../../../../ompi/mca/dpm/orte/dpm_orte.c at line 485 *But* combination ^sm,openib seems to work. I tried different revisions from the beginning of October. It reproduces on them. Best regards, Elena