Howard,

did you bump both btl_tcp_rndv_eager_limit and btl_tcp_eager_limit ?

you might also need to bump btl_tcp_sndbuf, btl_tcp_rcvbuf and btl_tcp_max_send_size to get the max performance out of your 100Gb ethernet cards

last but not least, you might also need to bump btl_tcp_links to saturate your network (that is likely a good thing when running 1 task per node, but that can lead to decreased performance when running several tasks per node)

Cheers,


Gilles


On 7/19/2016 6:57 AM, Howard Pritchard wrote:
Hi Folks,

I have a cluster with some 100 Gb ethernet cards
installed.  What we are noticing if we force Open MPI 1.10.3
to go through the TCP BTL (rather than yalla)  is that
the performance of osu_bw once the TCP BTL switches
from eager to rendezvous (> 32KB)
falls off a cliff, going from about 1.6 GB/sec to 233 MB/sec
and stays that way out to 4 MB message lengths.

There's nothing wrong with the IP stack (iperf -P4 gives
63 Gb/sec).

So, my questions are

1) is this performance expected for the TCP BTL when in
rendezvous mode?
2) is there some way to get more like the single socket
performance obtained with iperf for large messages (~16 Gb/sec).

We tried adjusting the tcp_btl_rendezvous threshold but that doesn't
appear to actually be adjustable from the mpirun command line.

Thanks for any suggestions,

Howard





_______________________________________________
devel mailing list
de...@open-mpi.org
Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this post: 
http://www.open-mpi.org/community/lists/devel/2016/07/19237.php

Reply via email to