Works fine for me:

[rhc@bend001 mpi]$ mpirun -n 3 --host bend001 ./simple_spawn
[pid 22777] starting up!
[pid 22778] starting up!
[pid 22779] starting up!
1 completed MPI_Init
Parent [pid 22778] about to spawn!
2 completed MPI_Init
Parent [pid 22779] about to spawn!
0 completed MPI_Init
Parent [pid 22777] about to spawn!
[pid 22783] starting up!
[pid 22784] starting up!
Parent done with spawn
Parent sending message to child
Parent done with spawn
Parent done with spawn
0 completed MPI_Init
Hello from the child 0 of 2 on host bend001 pid 22783
Child 0 received msg: 38
1 completed MPI_Init
Hello from the child 1 of 2 on host bend001 pid 22784
Child 1 disconnected
Parent disconnected
Parent disconnected
Parent disconnected
Child 0 disconnected
22784: exiting
22778: exiting
22779: exiting
22777: exiting
22783: exiting
[rhc@bend001 mpi]$ make spawn_multiple
mpicc -g --openmpi:linkall    spawn_multiple.c   -o spawn_multiple
[rhc@bend001 mpi]$ mpirun -n 3 --host bend001 ./spawn_multiple
Parent [pid 22797] about to spawn!
Parent [pid 22798] about to spawn!
Parent [pid 22799] about to spawn!
Parent done with spawn
Parent done with spawn
Parent sending message to children
Parent done with spawn
Hello from the child 0 of 2 on host bend001 pid 22803: argv[1] = foo
Child 0 received msg: 38
Hello from the child 1 of 2 on host bend001 pid 22804: argv[1] = bar
Child 1 disconnected
Parent disconnected
Parent disconnected
Parent disconnected
Child 0 disconnected
[rhc@bend001 mpi]$ mpirun -n 3 --host bend001 -mca coll ^ml ./intercomm_create
b: MPI_Intercomm_create( intra, 0, intra, MPI_COMM_NULL, 201, &inter) [rank 3]
b: MPI_Intercomm_create( intra, 0, intra, MPI_COMM_NULL, 201, &inter) [rank 4]
b: MPI_Intercomm_create( intra, 0, intra, MPI_COMM_NULL, 201, &inter) [rank 5]
c: MPI_Intercomm_create( MPI_COMM_WORLD, 0, intra, 0, 201, &inter) [rank 3]
c: MPI_Intercomm_create( MPI_COMM_WORLD, 0, intra, 0, 201, &inter) [rank 4]
c: MPI_Intercomm_create( MPI_COMM_WORLD, 0, intra, 0, 201, &inter) [rank 5]
a: MPI_Intercomm_create( ab_intra, 0, ac_intra, 3, 201, &inter) (0)
a: MPI_Intercomm_create( ab_intra, 0, ac_intra, 3, 201, &inter) (0)
a: MPI_Intercomm_create( ab_intra, 0, ac_intra, 3, 201, &inter) (0)
b: intercomm_create (0)
b: barrier on inter-comm - before
b: barrier on inter-comm - after
b: intercomm_create (0)
b: barrier on inter-comm - before
b: barrier on inter-comm - after
c: intercomm_create (0)
c: barrier on inter-comm - before
c: barrier on inter-comm - after
c: intercomm_create (0)
c: barrier on inter-comm - before
c: barrier on inter-comm - after
a: intercomm_create (0)
a: barrier on inter-comm - before
a: barrier on inter-comm - after
c: intercomm_create (0)
c: barrier on inter-comm - before
c: barrier on inter-comm - after
a: intercomm_create (0)
a: barrier on inter-comm - before
a: barrier on inter-comm - after
a: intercomm_create (0)
a: barrier on inter-comm - before
a: barrier on inter-comm - after
b: intercomm_create (0)
b: barrier on inter-comm - before
b: barrier on inter-comm - after
a: intercomm_merge(0) (0) [rank 2]
c: intercomm_merge(0) (0) [rank 8]
a: intercomm_merge(0) (0) [rank 0]
a: intercomm_merge(0) (0) [rank 1]
c: intercomm_merge(0) (0) [rank 7]
b: intercomm_merge(1) (0) [rank 4]
b: intercomm_merge(1) (0) [rank 5]
c: intercomm_merge(0) (0) [rank 6]
b: intercomm_merge(1) (0) [rank 3]
a: barrier (0)
b: barrier (0)
c: barrier (0)
a: barrier (0)
c: barrier (0)
b: barrier (0)
a: barrier (0)
c: barrier (0)
b: barrier (0)
dpm_base_disconnect_init: error -12 in isend to process 3
dpm_base_disconnect_init: error -12 in isend to process 3
dpm_base_disconnect_init: error -12 in isend to process 3
dpm_base_disconnect_init: error -12 in isend to process 0
dpm_base_disconnect_init: error -12 in isend to process 3
dpm_base_disconnect_init: error -12 in isend to process 3
dpm_base_disconnect_init: error -12 in isend to process 3
dpm_base_disconnect_init: error -12 in isend to process 0
dpm_base_disconnect_init: error -12 in isend to process 3
dpm_base_disconnect_init: error -12 in isend to process 3
dpm_base_disconnect_init: error -12 in isend to process 3
dpm_base_disconnect_init: error -12 in isend to process 1
dpm_base_disconnect_init: error -12 in isend to process 3
[rhc@bend001 mpi]$ 



On Jun 6, 2014, at 11:26 AM, Rolf vandeVaart <rvandeva...@nvidia.com> wrote:

> I am seeing an interesting failure on trunk.  intercomm_create, spawn, and 
> spawn_multiple from the IBM tests hang if I explicitly list the hostnames to 
> run on.  For example:
> 
> Good:
> $ mpirun -np 2 --mca btl self,sm,tcp spawn_multiple
> Parent: 0 of 2, drossetti-ivy0.nvidia.com (0 in init)
> Parent: 1 of 2, drossetti-ivy0.nvidia.com (0 in init)
> Child: 0 of 4, drossetti-ivy0.nvidia.com (this is job 1) (1 in init)
> Child: 1 of 4, drossetti-ivy0.nvidia.com (this is job 1) (1 in init)
> Child: 2 of 4, drossetti-ivy0.nvidia.com (this is job 2) (1 in init)
> Child: 3 of 4, drossetti-ivy0.nvidia.com (this is job 2) (1 in init)
> $ 
> 
> Bad:
> $ mpirun -np 2 --mca btl self,sm,tcp -host drossetti-ivy0,drossetti-ivy0 
> spawn_multiple
> Parent: 0 of 2, drossetti-ivy0.nvidia.com (1 in init)
> Parent: 1 of 2, drossetti-ivy0.nvidia.com (1 in init)
> Child: 0 of 4, drossetti-ivy0.nvidia.com (this is job 1) (1 in init)
> Child: 1 of 4, drossetti-ivy0.nvidia.com (this is job 1) (1 in init)
> Child: 2 of 4, drossetti-ivy0.nvidia.com (this is job 2) (1 in init)
> Child: 3 of 4, drossetti-ivy0.nvidia.com (this is job 2) (1 in init)
> [..and we are hung here...]
> 
> I see the exact same behavior for spawn and spawn_multiple.  Ralph, any 
> thoughts?  Open MPI 1.8 is fine.  I can provide more information if needed, 
> but I assume this is reproducible. 
> 
> Thanks,
> Rolf
> -----------------------------------------------------------------------------------
> This email message is for the sole use of the intended recipient(s) and may 
> contain
> confidential information.  Any unauthorized review, use, disclosure or 
> distribution
> is prohibited.  If you are not the intended recipient, please contact the 
> sender by
> reply email and destroy all copies of the original message.
> -----------------------------------------------------------------------------------
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/06/14990.php

Reply via email to