On Thu, 2012-06-14 at 08:24 -0700, Pradeep Satyanarayana wrote: > Traditional sockets based applications wanting high throughput could use > rsockets Since it is layered on top of uverbs we expected to see good > throughput numbers. > So, we started to run netperf and iperf. We observed that it tops off at > about 20Gb/s with QDR adapters. A quick "perf top" revealed a lot of > cycles spent in memcpy(). > We had hoped these numbers would be somewhat higher since we did not > expect the memcpy() to have such a large overhead. > > Given the copy overhead, we wanted to revisit the IPoIB and SDP > performance. Hence we installed to OFED-1.5.4.1 on RHEL 6.2. We found > that for small packets SDP starts > with low throughputs, but seems to catch up with rsockets at about 16 KB > packets. On the other hand IPoIB CM tops off at about 10 Gb/s. > > Since SDP does in kernel RDMA we expected IPoIB CM and SDP numbers to be > much closer. Again "perf top" revealed that IPoIB was spending a large > number of cycles in > checksum computation. Out of curiosity Sridhar made the following changes: > > --- ipoib_cm.c.orig 2012-06-10 15:27:10.589325138 -0400 > +++ ipoib_cm.c 2012-06-12 11:29:49.073262516 -0400 > @@ -670,6 +670,7 @@ copied: > skb->dev = dev; > /* XXX get correct PACKET_ type here */ > skb->pkt_type = PACKET_HOST; > + skb->ip_summed = CHECKSUM_UNNECESSARY; > netif_receive_skb(skb); > > @@ -1464,7 +1464,8 @@ static ssize_t set_mode(struct device *d > "will cause multicast packet drops\n"); > > rtnl_lock(); > - dev->features &= ~(NETIF_F_IP_CSUM | NETIF_F_SG | NETIF_F_TSO); > + dev->features &= ~(NETIF_F_SG | NETIF_F_TSO);
Enabling NETIF_F_SG improves the throughput further by avoiding a additional kernel memcpy caused by skb_linearize() in dev_queue_xmit(). Thanks Sridhar > priv->tx_wr.send_flags &= ~IB_SEND_IP_CSUM; > > if (ipoib_cm_max_mtu(dev) > priv->mcast_mtu) > > > With these minimal changes IPoIB throughput reached between 19-20Gb/s > with just 2 threads. This was really unexpected. Given that, we wanted > to revisit the usage of checksums in IPoIB. > So, it looks worthwhile to allow for 'checksum-less' IPoIB-CM within a > cluster on a single subnet. From a checksum perspective, this would be > no different from RDMA. What are your thoughts? > > Thanks > Pradeep -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to [email protected] More majordomo info at http://vger.kernel.org/majordomo-info.html
