Don, Galen, and I talked about this in depth on the phone today and think that it is a symptom of the same issue discussed in this thread:

    http://www.open-mpi.org/community/lists/devel/2007/10/2382.php

Note my message in that thread from just a few minutes ago:

    http://www.open-mpi.org/community/lists/devel/2007/11/2561.php

We think that the proposed solution to that thread will also fix the mpi_preconnect_all issues (i.e., the ping-pong that Don proposes in his mail should not be necessary).



On Oct 17, 2007, at 10:54 AM, Don Kerr wrote:

All,

I have noticed an issue in the 1.2 branch when mpi_preconnect_all=1. The
one way communication pattern (ranks either send or receive from each
other) may not fully establish connection with peers. Example, if I have
a 3 process mpi job and rank 0 does not do any mpi communication after
MPI_Init() the other ranks attempts to connect will not be progressed (I
have seen this with tcp and udapl).
The preconnect pattern has changed slightly in the trunk but essentially
it is still a one way communication, either send or receive with each
rank. So although the issue I see in the 1.2 branch does not appear in
the trunk I wonder if this will show up again.

An alternative to the preconnect pattern that comes to mind would be to perform a send and receive between all ranks to ensure that connections
have been fully established.

Does anyone have thoughts or comments on this, or reasons not to have
all ranks send and receive from all?

-DON
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


--
Jeff Squyres
Cisco Systems

Reply via email to