hi Howard, Was this issue resolved ? If so, what is the solution ? Please let me know. Curious to know , since we are also experimenting with these limits.
Thanks, - Sreenidhi. On Tue, Jul 19, 2016 at 10:50 AM, Gilles Gouaillardet <gil...@rist.or.jp> wrote: > Howard, > > > did you bump both btl_tcp_rndv_eager_limit and btl_tcp_eager_limit ? > > you might also need to bump btl_tcp_sndbuf, btl_tcp_rcvbuf and > btl_tcp_max_send_size to get the max performance out of your 100Gb ethernet > cards > > last but not least, you might also need to bump btl_tcp_links to saturate > your network (that is likely a good thing when running 1 task per node, but > that can lead to decreased performance when running several tasks per node) > > Cheers, > > > Gilles > > On 7/19/2016 6:57 AM, Howard Pritchard wrote: > > Hi Folks, > > I have a cluster with some 100 Gb ethernet cards > installed. What we are noticing if we force Open MPI 1.10.3 > to go through the TCP BTL (rather than yalla) is that > the performance of osu_bw once the TCP BTL switches > from eager to rendezvous (> 32KB) > falls off a cliff, going from about 1.6 GB/sec to 233 MB/sec > and stays that way out to 4 MB message lengths. > > There's nothing wrong with the IP stack (iperf -P4 gives > 63 Gb/sec). > > So, my questions are > > 1) is this performance expected for the TCP BTL when in > rendezvous mode? > 2) is there some way to get more like the single socket > performance obtained with iperf for large messages (~16 Gb/sec). > > We tried adjusting the tcp_btl_rendezvous threshold but that doesn't > appear to actually be adjustable from the mpirun command line. > > Thanks for any suggestions, > > Howard > > > > > > _______________________________________________ > devel mailing listde...@open-mpi.org > Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2016/07/19237.php > > > > _______________________________________________ > devel mailing list > de...@open-mpi.org > Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2016/07/19240.php >