I am seeing an interesting failure on trunk. intercomm_create, spawn, and spawn_multiple from the IBM tests hang if I explicitly list the hostnames to run on. For example:
Good: $ mpirun -np 2 --mca btl self,sm,tcp spawn_multiple Parent: 0 of 2, drossetti-ivy0.nvidia.com (0 in init) Parent: 1 of 2, drossetti-ivy0.nvidia.com (0 in init) Child: 0 of 4, drossetti-ivy0.nvidia.com (this is job 1) (1 in init) Child: 1 of 4, drossetti-ivy0.nvidia.com (this is job 1) (1 in init) Child: 2 of 4, drossetti-ivy0.nvidia.com (this is job 2) (1 in init) Child: 3 of 4, drossetti-ivy0.nvidia.com (this is job 2) (1 in init) $ Bad: $ mpirun -np 2 --mca btl self,sm,tcp -host drossetti-ivy0,drossetti-ivy0 spawn_multiple Parent: 0 of 2, drossetti-ivy0.nvidia.com (1 in init) Parent: 1 of 2, drossetti-ivy0.nvidia.com (1 in init) Child: 0 of 4, drossetti-ivy0.nvidia.com (this is job 1) (1 in init) Child: 1 of 4, drossetti-ivy0.nvidia.com (this is job 1) (1 in init) Child: 2 of 4, drossetti-ivy0.nvidia.com (this is job 2) (1 in init) Child: 3 of 4, drossetti-ivy0.nvidia.com (this is job 2) (1 in init) [..and we are hung here...] I see the exact same behavior for spawn and spawn_multiple. Ralph, any thoughts? Open MPI 1.8 is fine. I can provide more information if needed, but I assume this is reproducible. Thanks, Rolf ----------------------------------------------------------------------------------- This email message is for the sole use of the intended recipient(s) and may contain confidential information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. -----------------------------------------------------------------------------------