Sébastien Boisvert wrote:

Now I can describe the cases.
The test cases can all be explained by the test requiring eager messages (something that test4096.cpp does not require).

Case 1: 30 MPI ranks, message size is 4096 bytes

File: mpirun-np-30-Program-4096.txt
Outcome: It hangs -- I killed the poor thing after 30 seconds or so.
4096 is rendezvous.  For eager, try 4000 or lower.

Case 2: 30 MPI ranks, message size is 1 byte

File: mpirun-np-30-Program-1.txt.gz
Outcome: It runs just fine.
1 byte is eager.

Case 3: 2 MPI ranks, message size is 4096 bytes

File: mpirun-np-2-Program-4096.txt
Outcome: It hangs -- I killed the poor thing after 30 seconds or so.
Same as Case 1.

Case 4: 30 MPI ranks, message size if 4096 bytes, shared memory is
disabled

File: mpirun-mca-btl-^sm-np-30-Program-4096.txt.gz
Outcome: It runs just fine.
Eager limit for TCP is 65536 (perhaps less some overhead). So, these messages are eager.


Reply via email to