Ralph, i noted several hangs in mtt with the v1.8 branch.
a simple way to reproduce it is to use the MPI_Errhandler_fatal_f test from the intel_tests suite, invoke mpirun on one node and run the taks on an other node : node0$ mpirun -np 3 -host node1 --mca btl tcp,self ./MPI_Errhandler_fatal_f /* since this is a race condition, you might need to run this in a loop in order to hit the bug */ the attached tarball contains a patch (add debug + temporary hack) and some log files obtained with --mca errmgr_base_verbose 100 --mca odls_base_verbose 100 without the hack, i can reproduce the bug with -np 3 (log.ko.txt) , with the hack, i can still reproduce the hang (though it might be a different one) with -np 16 (log.ko.2.txt) i remember some similar hangs were fixed on the trunk/master a few monthes ago. i tried to backport some commits but it did not help :-( could you please have a look at this ? Cheers, Gilles
abort_hang.tar.gz
Description: application/gzip