You were faster to fix the bug than I was to send my bug report :-)
So I confirm : this fixes the problem.
Thanks !
Sylvain
On Mon, 21 Sep 2009, Edgar Gabriel wrote:
what version of OpenMPI did you use? Patch #21970 should have fixed this
issue on the trunk...
Thanks
Edgar
Sylvain Jeaugey wrote:
Hi list,
We are currently experiencing deadlocks when using communicators other than
MPI_COMM_WORLD. So we made a very simple reproducer (Comm_create then
MPI_Barrier on the communicator - see end of e-mail).
We can reproduce the deadlock only with openib and with at least 8 cores
(no success with sm) and after ~20 runs average. Using larger number of
cores greatly increases the occurence of the deadlock. When the deadlock
occurs, every even process is stuck in MPI_Finalize and every odd process
is in MPI_Barrier.
So we tracked the bug in the changesets and found out that this patch seem
to have introduced the bug :
user: brbarret
date: Tue Aug 25 15:13:31 2009 +0000
summary: Per discussion in ticket #2009, temporarily disable the block
CID allocation
algorithms until they properly reuse CIDs.
Reverting to the non multi-thread cid allocator makes the deadlock
disappear.
I tried to dig further and understand why this makes a difference, with no
luck.
If anyone can figure out what's happening, that would be great ...
Thanks,
Sylvain
#include <mpi.h>
#include <stdio.h>
int main(int argc, char **argv) {
int rank, numTasks;
int range[3];
MPI_Comm testComm, dupComm;
MPI_Group orig_group, new_group;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &numTasks);
MPI_Comm_group(MPI_COMM_WORLD, &orig_group);
range[0] = 0; /* first rank */
range[1] = numTasks - 1; /* last rank */
range[2] = 1; /* stride */
MPI_Group_range_incl(orig_group, 1, &range, &new_group);
MPI_Comm_create(MPI_COMM_WORLD, new_group, &testComm);
MPI_Barrier(testComm);
MPI_Finalize();
return 0;
}
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
--
Edgar Gabriel
Assistant Professor
Parallel Software Technologies Lab http://pstl.cs.uh.edu
Department of Computer Science University of Houston
Philip G. Hoffman Hall, Room 524 Houston, TX-77204, USA
Tel: +1 (713) 743-3857 Fax: +1 (713) 743-3335
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel