More information about this can be founded in the trac #1127


On Sep 18, 2007, at 10:20 AM, Gleb Natapov wrote:

On Tue, Sep 18, 2007 at 09:44:42AM -0400, George Bosilca wrote:
The setup of a communicators include as a last stage, a collective
communication. As a result, some of the nodes can exit the collective
before the others and therefore can start sending messages using this
communicator [while some of the other nodes are still waiting for the
collective completion]. This will lead to a situation where a node receive
a message for a communicator that they are building up.

There is a bug filled in trac about this. In FT-MPI we temporary put these messages in an internal queue, and deliver them to the right communicator
only once this communicator is completely created.
In ompi_comm_nextcid() function there is this code for thread_multiple

 /* for synchronization purposes, avoids receiving fragments for
a communicator id, which might not yet been known. For single- threaded
    scenarios, this call is in ompi_comm_activate, for multi-threaded
    scenarios, it has to be already here ( before releasing another
    thread into the cid-allocation loop ) */
 (allredfnct)(&response, &glresponse, 1, MPI_MIN, comm, bridgecomm,
                     local_leader, remote_leader, send_first );

This collective is executed on old communicator after setup of a new
cid. Is this not enough to solve the problem? Some ranks may leave
this collective call earlier than others, but none can leave it before
all ranks enter it and at this stage new communicator is already exists
in all of them. Do I miss something?


On Sep 18, 2007, at 9:06 AM, Gleb Natapov wrote:


In the comment you are saying that "a message for a not yet existing
communicator can happen". Can you explain in what situation it can


devel mailing list

devel mailing list

devel mailing list

Attachment: smime.p7s
Description: S/MIME cryptographic signature

Reply via email to