Jeff Squyres wrote:

There's no synchronization *guarantee* in MPI collectives except for MPI_BARRIER. [...] BCAST *can* synchronize; I'm not saying it has to.
I fully agree with Jeff and would even go a step further.

As has already been noted, there are also some implicit data dependencies due to the fact that we do "message passing". This means that a receiver can only get a message after the sender has posted it. So yes, all processes get their broadcast message only after the root called MPI_Bcast and the like. But does this necessarily imply that all processes block in such a call and return only after the senders joined the communication? In my opinion, no correct and portable MPI program should rely on anything that is not explicitly stated in the standard.

Example to think about: I developed an MPI wrapper several years ago (for a slow interconnect), which almost immediately returned from blocking MPI calls. Instead of wasting time to wait for the senders, it utilized features of the virtual memory subsystem to protect the given message buffers from not-yet-allowed accesses (i.e., write access for send buffers and read access for receive buffer), and started the communication in the background like the nonblocking variants. The blocking (if at all) happened only at the time the data was actually accessed by the processor (so this implicit synchronization point we are taking about was just delayed). This enabled communication and computation overlap without rewriting the application (even for send operations or large messages due to pipelining) - just relink and see if it gets faster. I'm not totally sure that this is 100% MPI conform - but as long as programmers don't rely on anything that is not explicitly stated in the standard, they could benefit from such implementations...


Reply via email to