Jeff Squyres wrote:
There's no synchronization *guarantee* in MPI collectives except for
MPI_BARRIER. [...] BCAST *can* synchronize; I'm not saying it has to.
I fully agree with Jeff and would even go a step further.
As has already been noted, there are also some implicit data
dependencies due to the fact that we do "message passing". This means
that a receiver can only get a message after the sender has posted it.
So yes, all processes get their broadcast message only after the root
called MPI_Bcast and the like. But does this necessarily imply that
all processes block in such a call and return only after the senders
joined the communication? In my opinion, no correct and portable MPI
program should rely on anything that is not explicitly stated in the
standard.
Example to think about: I developed an MPI wrapper several years ago
(for a slow interconnect), which almost immediately returned from
blocking MPI calls. Instead of wasting time to wait for the senders,
it utilized features of the virtual memory subsystem to protect the
given message buffers from not-yet-allowed accesses (i.e., write
access for send buffers and read access for receive buffer), and
started the communication in the background like the nonblocking
variants. The blocking (if at all) happened only at the time the data
was actually accessed by the processor (so this implicit
synchronization point we are taking about was just delayed). This
enabled communication and computation overlap without rewriting the
application (even for send operations or large messages due to
pipelining) - just relink and see if it gets faster. I'm not totally
sure that this is 100% MPI conform - but as long as programmers don't
rely on anything that is not explicitly stated in the standard, they
could benefit from such implementations...