Traditional sockets based applications wanting high throughput could use
rsockets Since it is layered on top of uverbs we expected to see good
throughput numbers.
So, we started to run netperf and iperf. We observed that it tops off at
about 20Gb/s with QDR adapters. A quick "perf top" revealed a lot of
cycles spent in memcpy().
We had hoped these numbers would be somewhat higher since we did not
expect the memcpy() to have such a large overhead.
Given the copy overhead, we wanted to revisit the IPoIB and SDP
performance. Hence we installed to OFED-1.5.4.1 on RHEL 6.2. We found
that for small packets SDP starts
with low throughputs, but seems to catch up with rsockets at about 16 KB
packets. On the other hand IPoIB CM tops off at about 10 Gb/s.
Since SDP does in kernel RDMA we expected IPoIB CM and SDP numbers to be
much closer. Again "perf top" revealed that IPoIB was spending a large
number of cycles in
checksum computation. Out of curiosity Sridhar made the following changes:
--- ipoib_cm.c.orig 2012-06-10 15:27:10.589325138 -0400
+++ ipoib_cm.c 2012-06-12 11:29:49.073262516 -0400
@@ -670,6 +670,7 @@ copied:
skb->dev = dev;
/* XXX get correct PACKET_ type here */
skb->pkt_type = PACKET_HOST;
+ skb->ip_summed = CHECKSUM_UNNECESSARY;
netif_receive_skb(skb);
@@ -1464,7 +1464,8 @@ static ssize_t set_mode(struct device *d
"will cause multicast packet drops\n");
rtnl_lock();
- dev->features &= ~(NETIF_F_IP_CSUM | NETIF_F_SG | NETIF_F_TSO);
+ dev->features &= ~(NETIF_F_SG | NETIF_F_TSO);
priv->tx_wr.send_flags &= ~IB_SEND_IP_CSUM;
if (ipoib_cm_max_mtu(dev) > priv->mcast_mtu)
With these minimal changes IPoIB throughput reached between 19-20Gb/s
with just 2 threads. This was really unexpected. Given that, we wanted
to revisit the usage of checksums in IPoIB.
So, it looks worthwhile to allow for 'checksum-less' IPoIB-CM within a
cluster on a single subnet. From a checksum perspective, this would be
no different from RDMA. What are your thoughts?
Thanks
Pradeep
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html