We have some new Power8 nodes with dual-port FDR HCAs. I have not tested same-node Verbs throughput. Using Linux’s Cross Memory Attach (CMA), I can get 30 GB/s for 2 MB messages between two cores and then it drops off to ~12 GB/s. The PCIe Gen3 x16 slots should max at ~15 GB/s. I agree that when there are more than two processes communicating, that shared memory will go higher while the PCIe link is capped at ~15 GB/s.
Scott On Mar 11, 2015, at 1:41 PM, Howard Pritchard <[email protected]> wrote: > My experience with DMA engines located on the other side of a PCI-e 16x gen3 > bus from the cpus is that for a couple of ranks doing large > transfers between each other on a node, using the DMA engine looks good. But > once there are multiple ranks exchanging data (like up to 32 ranks on a dual > socket haswell node, not using HT), using the DMA engine of the NIC is not > such a good idea. > > Howard > > > 2015-03-11 10:57 GMT-06:00 Nathan Hjelm <[email protected]>: > > Definitely a side-effect though it could be beneficial in some cases as > the RDMA engine in the HCA may be faster than using memcpy (larger than > a certain size). I don't know how to best fix this as I need all RDMA > capable BTLs to listed for RMA. I though about adding another list to > track BTLs that have both RMA and atomics but that would increase the > memory footprint of Open MPI by a factor of nranks. > > -Nathan > > On Thu, Feb 26, 2015 at 11:59:41PM +0000, Rolf vandeVaart wrote: > > This message is mostly for Nathan, but figured I would go with the wider > > distribution. I have noticed some different behaviour that I assume > > started with this change. > > > > > > https://github.com/open-mpi/ompi/commit/4bf7a207e90997e75ba1c60d9d191d9d96402d04 > > > > I am noticing that the openib BTL will also be used for on-node > > communication even though the sm (or smcuda) BTL is also available. I > > think with the aforementioned change that the openib BTL is listed as an > > available BTL that supports RDMA. While looking through the debugger and > > looking at the bml_endpoint, it appears that the sm BTL is listed as the > > eager and send BTL, but the openib is listed as the RDMA btl. Looking at > > the logic in pml_ob1_sendreq.h, it looks like we can end up selecting the > > openib btl for some of the communication. I ran with some various > > verbosity and saw that this was happening. With v1.8, we only appear to > > use the sm (or smcuda) btl. > > > > I am wondering if this was intentional with this change or maybe a side > > effect. > > > > Rolf > > > > ---------------------------------------------------------------------- > > > > This email message is for the sole use of the intended recipient(s) and > > may contain confidential information. Any unauthorized review, use, > > disclosure or distribution is prohibited. If you are not the intended > > recipient, please contact the sender by reply email and destroy all > > copies > > of the original message. > > > > ---------------------------------------------------------------------- > > > _______________________________________________ > > devel mailing list > > [email protected] > > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > > Link to this post: > > http://www.open-mpi.org/community/lists/devel/2015/02/17065.php > > > _______________________________________________ > devel mailing list > [email protected] > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2015/03/17127.php > > _______________________________________________ > devel mailing list > [email protected] > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2015/03/17128.php
