Works fine for me: [rhc@bend001 mpi]$ mpirun -n 3 --host bend001 ./simple_spawn [pid 22777] starting up! [pid 22778] starting up! [pid 22779] starting up! 1 completed MPI_Init Parent [pid 22778] about to spawn! 2 completed MPI_Init Parent [pid 22779] about to spawn! 0 completed MPI_Init Parent [pid 22777] about to spawn! [pid 22783] starting up! [pid 22784] starting up! Parent done with spawn Parent sending message to child Parent done with spawn Parent done with spawn 0 completed MPI_Init Hello from the child 0 of 2 on host bend001 pid 22783 Child 0 received msg: 38 1 completed MPI_Init Hello from the child 1 of 2 on host bend001 pid 22784 Child 1 disconnected Parent disconnected Parent disconnected Parent disconnected Child 0 disconnected 22784: exiting 22778: exiting 22779: exiting 22777: exiting 22783: exiting [rhc@bend001 mpi]$ make spawn_multiple mpicc -g --openmpi:linkall spawn_multiple.c -o spawn_multiple [rhc@bend001 mpi]$ mpirun -n 3 --host bend001 ./spawn_multiple Parent [pid 22797] about to spawn! Parent [pid 22798] about to spawn! Parent [pid 22799] about to spawn! Parent done with spawn Parent done with spawn Parent sending message to children Parent done with spawn Hello from the child 0 of 2 on host bend001 pid 22803: argv[1] = foo Child 0 received msg: 38 Hello from the child 1 of 2 on host bend001 pid 22804: argv[1] = bar Child 1 disconnected Parent disconnected Parent disconnected Parent disconnected Child 0 disconnected [rhc@bend001 mpi]$ mpirun -n 3 --host bend001 -mca coll ^ml ./intercomm_create b: MPI_Intercomm_create( intra, 0, intra, MPI_COMM_NULL, 201, &inter) [rank 3] b: MPI_Intercomm_create( intra, 0, intra, MPI_COMM_NULL, 201, &inter) [rank 4] b: MPI_Intercomm_create( intra, 0, intra, MPI_COMM_NULL, 201, &inter) [rank 5] c: MPI_Intercomm_create( MPI_COMM_WORLD, 0, intra, 0, 201, &inter) [rank 3] c: MPI_Intercomm_create( MPI_COMM_WORLD, 0, intra, 0, 201, &inter) [rank 4] c: MPI_Intercomm_create( MPI_COMM_WORLD, 0, intra, 0, 201, &inter) [rank 5] a: MPI_Intercomm_create( ab_intra, 0, ac_intra, 3, 201, &inter) (0) a: MPI_Intercomm_create( ab_intra, 0, ac_intra, 3, 201, &inter) (0) a: MPI_Intercomm_create( ab_intra, 0, ac_intra, 3, 201, &inter) (0) b: intercomm_create (0) b: barrier on inter-comm - before b: barrier on inter-comm - after b: intercomm_create (0) b: barrier on inter-comm - before b: barrier on inter-comm - after c: intercomm_create (0) c: barrier on inter-comm - before c: barrier on inter-comm - after c: intercomm_create (0) c: barrier on inter-comm - before c: barrier on inter-comm - after a: intercomm_create (0) a: barrier on inter-comm - before a: barrier on inter-comm - after c: intercomm_create (0) c: barrier on inter-comm - before c: barrier on inter-comm - after a: intercomm_create (0) a: barrier on inter-comm - before a: barrier on inter-comm - after a: intercomm_create (0) a: barrier on inter-comm - before a: barrier on inter-comm - after b: intercomm_create (0) b: barrier on inter-comm - before b: barrier on inter-comm - after a: intercomm_merge(0) (0) [rank 2] c: intercomm_merge(0) (0) [rank 8] a: intercomm_merge(0) (0) [rank 0] a: intercomm_merge(0) (0) [rank 1] c: intercomm_merge(0) (0) [rank 7] b: intercomm_merge(1) (0) [rank 4] b: intercomm_merge(1) (0) [rank 5] c: intercomm_merge(0) (0) [rank 6] b: intercomm_merge(1) (0) [rank 3] a: barrier (0) b: barrier (0) c: barrier (0) a: barrier (0) c: barrier (0) b: barrier (0) a: barrier (0) c: barrier (0) b: barrier (0) dpm_base_disconnect_init: error -12 in isend to process 3 dpm_base_disconnect_init: error -12 in isend to process 3 dpm_base_disconnect_init: error -12 in isend to process 3 dpm_base_disconnect_init: error -12 in isend to process 0 dpm_base_disconnect_init: error -12 in isend to process 3 dpm_base_disconnect_init: error -12 in isend to process 3 dpm_base_disconnect_init: error -12 in isend to process 3 dpm_base_disconnect_init: error -12 in isend to process 0 dpm_base_disconnect_init: error -12 in isend to process 3 dpm_base_disconnect_init: error -12 in isend to process 3 dpm_base_disconnect_init: error -12 in isend to process 3 dpm_base_disconnect_init: error -12 in isend to process 1 dpm_base_disconnect_init: error -12 in isend to process 3 [rhc@bend001 mpi]$
On Jun 6, 2014, at 11:26 AM, Rolf vandeVaart <rvandeva...@nvidia.com> wrote: > I am seeing an interesting failure on trunk. intercomm_create, spawn, and > spawn_multiple from the IBM tests hang if I explicitly list the hostnames to > run on. For example: > > Good: > $ mpirun -np 2 --mca btl self,sm,tcp spawn_multiple > Parent: 0 of 2, drossetti-ivy0.nvidia.com (0 in init) > Parent: 1 of 2, drossetti-ivy0.nvidia.com (0 in init) > Child: 0 of 4, drossetti-ivy0.nvidia.com (this is job 1) (1 in init) > Child: 1 of 4, drossetti-ivy0.nvidia.com (this is job 1) (1 in init) > Child: 2 of 4, drossetti-ivy0.nvidia.com (this is job 2) (1 in init) > Child: 3 of 4, drossetti-ivy0.nvidia.com (this is job 2) (1 in init) > $ > > Bad: > $ mpirun -np 2 --mca btl self,sm,tcp -host drossetti-ivy0,drossetti-ivy0 > spawn_multiple > Parent: 0 of 2, drossetti-ivy0.nvidia.com (1 in init) > Parent: 1 of 2, drossetti-ivy0.nvidia.com (1 in init) > Child: 0 of 4, drossetti-ivy0.nvidia.com (this is job 1) (1 in init) > Child: 1 of 4, drossetti-ivy0.nvidia.com (this is job 1) (1 in init) > Child: 2 of 4, drossetti-ivy0.nvidia.com (this is job 2) (1 in init) > Child: 3 of 4, drossetti-ivy0.nvidia.com (this is job 2) (1 in init) > [..and we are hung here...] > > I see the exact same behavior for spawn and spawn_multiple. Ralph, any > thoughts? Open MPI 1.8 is fine. I can provide more information if needed, > but I assume this is reproducible. > > Thanks, > Rolf > ----------------------------------------------------------------------------------- > This email message is for the sole use of the intended recipient(s) and may > contain > confidential information. Any unauthorized review, use, disclosure or > distribution > is prohibited. If you are not the intended recipient, please contact the > sender by > reply email and destroy all copies of the original message. > ----------------------------------------------------------------------------------- > _______________________________________________ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/06/14990.php