Hi Markus,
As Or already mentioned, it seems that we have accumulations of ip
packets, when GRO is enabled over ib interface, from tcpdump in the
recieve side we can see:
10:09:27.336951 IP 11.134.33.1.41377 > 11.134.41.1.35957: Flags [.], seq
3795959253:3796023381, ack 2, win 110, length 64128
10:09:27.336987 IP 11.134.41.1.35957 > 11.134.33.1.41377: Flags [.], ack
3796023381, win 2036, length 0
10:09:27.337022 IP 11.134.33.1.41377 > 11.134.41.1.35957: Flags [.], seq
3796023381:3796087509, ack 2, win 110, length 64128
10:09:27.337044 IP 11.134.41.1.35957 > 11.134.33.1.41377: Flags [.], ack
3796087509, win 3038, length 0
10:09:27.337083 IP 11.134.33.1.41377 > 11.134.41.1.35957: Flags [.], seq
3796087509:3796151637, ack 2, win 110, length 64128
10:09:27.337107 IP 11.134.41.1.35957 > 11.134.33.1.41377: Flags [.], ack
3796151637, win 4040, length 0
10:09:27.337142 IP 11.134.33.1.41377 > 11.134.41.1.35957: Flags [.], seq
3796151637:3796215765, ack 2, win 110, length 64128
.....
....
don't you see that behaviour in tcpdump? what kernel are you using?
I will take a look into the gro/our code to check if we missed
something, and update.
Thanks, Erez
Hello,
I have a little update to the unlucky GRO IPoIB behaviour I observed
in the last weeks in datagram mode on our ConnectX cards. In the
GRO receive path the kernel steps into the inet_gro_receive() function
of net/ipv4/af_inet.c. If I read the code right it compares two
IP packets and decides if they come from the same "flow".
Further checks are included in some subroutines that narrow
down the comparison to IPv4 and so on.
I put a debugging message into the following comparison that
seems to be the culprit of it all.
inet_gro_receive()
...
/* All fields must match except length and checksum. */
NAPI_GRO_CB(p)->flush |=
(iph->ttl ^ iph2->ttl) |
(iph->tos ^ iph2->tos) |
(__force int)((iph->frag_off ^ iph2->frag_off) & htons(IP_DF)) |
((u16)(ntohs(iph2->id) + NAPI_GRO_CB(p)->count) ^ id);
/* Do some debug */
printk("%i %i %i\n",ntohs(iph2->id),NAPI_GRO_CB(p)->count,id);
...
On a normal GBit Intel card the kernel output reads:
32933 12 32945
32933 13 32946
32946 1 32947
32946 2 32948
...
32946 15 32961
32964 3 32967
32964 4 32968
...
The interpretation of it all should be that packet ids must match
the sum of the initial packet id plus its count field. Then
we have a GRO candidate.
On our ib0 interface the count field of a received packet seems
to be 1 most of the time and the packet id always matches the
initial packet id:
35754 1 35754
35754 1 35754
35754 1 35754
...
35754 1 35786
35786 1 35786
35786 1 35786
...
Thats why the flush flag is always set and the GRO stack does
not work at all. I'm willing to dig deeper into this but I'm unsure
if those fields are filled on sender or receiver side and especially
where in the IPoIB stack. Maybe someone can point me into the
right direction so that I can dig deeper and provide some more
information.
Bet regards.
Markus
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html