On Tue, Sep 18, 2007 at 10:57:38AM -0400, George Bosilca wrote: > More information about this can be founded in the trac #1127 > (https://svn.open-mpi.org/trac/ompi/ticket/1127). > OK. So the code I cited is only a temporary solution. Thanks.
> george. > > On Sep 18, 2007, at 10:20 AM, Gleb Natapov wrote: > >> On Tue, Sep 18, 2007 at 09:44:42AM -0400, George Bosilca wrote: >>> The setup of a communicators include as a last stage, a collective >>> communication. As a result, some of the nodes can exit the collective >>> before the others and therefore can start sending messages using this >>> communicator [while some of the other nodes are still waiting for the >>> collective completion]. This will lead to a situation where a node >>> receive >>> a message for a communicator that they are building up. >>> >>> There is a bug filled in trac about this. In FT-MPI we temporary put >>> these >>> messages in an internal queue, and deliver them to the right communicator >>> only once this communicator is completely created. >> In ompi_comm_nextcid() function there is this code for thread_multiple >> case: >> >> /* for synchronization purposes, avoids receiving fragments for >> a communicator id, which might not yet been known. For single-threaded >> scenarios, this call is in ompi_comm_activate, for multi-threaded >> scenarios, it has to be already here ( before releasing another >> thread into the cid-allocation loop ) */ >> (allredfnct)(&response, &glresponse, 1, MPI_MIN, comm, bridgecomm, >> local_leader, remote_leader, send_first ); >> >> This collective is executed on old communicator after setup of a new >> cid. Is this not enough to solve the problem? Some ranks may leave >> this collective call earlier than others, but none can leave it before >> all ranks enter it and at this stage new communicator is already exists >> in all of them. Do I miss something? >> >> >>> >>> george. >>> >>> On Sep 18, 2007, at 9:06 AM, Gleb Natapov wrote: >>> >>>> George, >>>> >>>> In the comment you are saying that "a message for a not yet existing >>>> communicator can happen". Can you explain in what situation it can >>>> happen? >>>> >>>> Thanks, >>>> >>>> -- >>>> Gleb. >>>> _______________________________________________ >>>> devel mailing list >>>> de...@open-mpi.org >>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>> >> >> >> >>> _______________________________________________ >>> devel mailing list >>> de...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> >> -- >> Gleb. >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel > > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Gleb.