On Mon, Oct 28, 2013 at 12:34 PM, Markus Stockhausen <[email protected]> wrote: > Hello, > > about two month we had some problems with IPoIB transfer speeds . > See more http://marc.info/?l=linux-rdma&m=137823326109158&w=2 > After some quite hard test iterations the problem seems to come from the > IPoIB switch from LRO to GRO between kernels 2.6.37 and 2.6.38. > > I built a test setup with a 2.6.38 kernel and additionaly compiled a 2.6.37 > ib_ipoib module against it. This way I can run a direct comparison > between the old and new module. The major difference between the > two version is inside the ipoib_ib_handle_rx_wc() function: > > 2.6.37: lro_receive_skb(&priv->lro.lro_mgr, skb, NULL); > 2.6.38: napi_gro_receive(&priv->napi, skb); > > As in the last post we use ConnectX cards in datagram mode with a > 2044 MTU. We read a file sequentially from a NFS server into /dev/null. > We just want to get the wire speed neglecting hard drives. The > hardware is slightly newer so we get different transfer speeds but > the overall effect should be evident. The server uses a 3.5 kernel and > is not changed during the tests. > > With 2.6.37 IPoIB module on the client side and LRO enabled the > speed is 950 MByte/sec. On the NFS server side a tcpdump trace > reads like: > > 19:51:51.432630 IP 10.10.30.251.nfs > 10.10.30.1.781: > Flags [P.], seq 1008434065:1008497161, ack 617432, > win 688, options [nop,nop,TS val 133047292 ecr 429568], > length 63096 > 19:51:51.432672 IP 10.10.30.1.781 > 10.10.30.251.nfs: > Flags [.], ack 1008241041, win 24576, options > [nop,nop,TS val 429568 ecr 133047292], length 0 > 19:51:51.432677 IP 10.10.30.251.nfs > 10.10.30.1.781: > Flags [.], seq 1008497161:1008560905, ack 617432, > win 688, options [nop,nop,TS val 133047292 ecr 429568], > length 63744 > 19:51:51.432725 IP 10.10.30.1.781 > 10.10.30.251.nfs: > Flags [.], ack 1008304585, win 24576, options > [nop,nop,TS val 429568 ecr 133047292], length 0 > 19:51:51.432729 IP 10.10.30.251.nfs > 10.10.30.1.781: > Flags [.], seq 1008560905:1008624649, ack 617432, > win 688, options [nop,nop,TS val 133047292 ecr 429568], > length 63744 > > With some slight differences here and there the client sends only > 1 ack for about 60k of transferred data. With 2.6.38 module and > onwards (GRO enabled) the speed drops down to 380 MByte/sec > and a different transfer pattern. > > 19:58:14.631430 IP 10.10.30.251.nfs > 10.10.30.1.ircs: > Flags [.], seq 722492293:722502253, ack 442312, win 537, > options [nop,nop,TS val 133143092 ecr 467889], length 9960 > 19:58:14.631460 IP 10.10.30.1.ircs > 10.10.30.251.nfs: > Flags [.], ack 722478181, win 24562, options > [nop,nop,TS val 467889 ecr 133143092], length 0 > 19:58:14.631485 IP 10.10.30.1.ircs > 10.10.30.251.nfs: > Flags [.], ack 722478181, win 24562, options > [nop,nop,TS val 467889 ecr 133143092,nop,nop,sack 1 > {722480117:722482333}], length 0 > 19:58:14.631510 IP 10.10.30.1.ircs > 10.10.30.251.nfs: > Flags [.], ack 722488197, win 24562, options [nop,nop,TS > val 467889 ecr 133143092], length 0 > 19:58:14.631534 IP 10.10.30.1.ircs > 10.10.30.251.nfs: > Flags [.], ack 722494229, win 24562, options > [nop,nop,TS val 467889 ecr 133143092], length 0 > > It seems as if the NFS client acknowledges every 2K packet > separately. I thought that it may come from missing > coalescing parameters and tried a "ethtool -C ib0 rx-usecs 5" > on both machines but without success. > > I'm quite lost now maybe someone can give a tip if I'm > missing something. >
Nice work! Look like napi_gro_receive() does not do the work it is supposed to do ?! My (embedded NFS client) system was on 2.6.38 kernel but we use ipoib kmod from OFED 1.5.4.1 - so we're still on lro_receive_skb() path that does not have this issue. I'll try it out later this week to see what is going on. Mellanox folks or Roland may have more to say. -- Wendy -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to [email protected] More majordomo info at http://vger.kernel.org/majordomo-info.html
