Le mardi 23 novembre 2010 à 16:07 -0500, Eugene Loh a écrit : > Sébastien Boisvert wrote: > > >Now I can describe the cases. > > > > > The test cases can all be explained by the test requiring eager messages > (something that test4096.cpp does not require). > > >Case 1: 30 MPI ranks, message size is 4096 bytes > > > >File: mpirun-np-30-Program-4096.txt > >Outcome: It hangs -- I killed the poor thing after 30 seconds or so. > > > > > 4096 is rendezvous. For eager, try 4000 or lower.
According to ompi_info, the threshold is 4096, not 4000, right ? (Open-MPI 1.4.3) [sboisver12@colosse1 ~]$ ompi_info -a|less MCA btl: parameter "btl_sm_eager_limit" (current value: "4096", data source: default value) Maximum size (in bytes) of "short" messages (must be >= 1). "btl_sm_eager_limit: Below this size, messages are sent "eagerly" -- that is, a sender attempts to write its entire message to shared buffers without waiting for a receiver to be ready. Above this size, a sender will only write the first part of a message, then wait for the receiver to acknowledge its ready before continuing. Eager sends can improve performance by decoupling senders from receivers." source: http://www.open-mpi.org/faq/?category=sm#more-sm It should say "Below this size or equal to this size" instead of "Below this size" as ompi_info says. ;) As Mr. George Bosilca put it: "__should__ is not correct, __might__ is a better verb to describe the most "common" behavior for small messages. The problem comes from the fact that in each communicator the FIFO ordering is required by the MPI standard. As soon as there is any congestion, MPI_Send will block even for small messages (and this independent on the underlying network) until all he pending packets have been delivered." source: http://www.open-mpi.org/community/lists/devel/2010/11/8696.php > > >Case 2: 30 MPI ranks, message size is 1 byte > > > >File: mpirun-np-30-Program-1.txt.gz > >Outcome: It runs just fine. > > > > > 1 byte is eager. I agree. > > >Case 3: 2 MPI ranks, message size is 4096 bytes > > > >File: mpirun-np-2-Program-4096.txt > >Outcome: It hangs -- I killed the poor thing after 30 seconds or so. > > > > > Same as Case 1. > > >Case 4: 30 MPI ranks, message size if 4096 bytes, shared memory is > >disabled > > > >File: mpirun-mca-btl-^sm-np-30-Program-4096.txt.gz > >Outcome: It runs just fine. > > > > > Eager limit for TCP is 65536 (perhaps less some overhead). So, these > messages are eager. I agree. > > > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel