On 8/16/19 2:09 PM, Juliana Rodrigueiro wrote:
> Greetings!
>
> During migration from kernel 3.14 to 4.19, we noticed a regression on the
> network performance. Under the exact same circumstances, the standard
> deviation of the latency is more than double than before on the Realtek
> RTL8111/8168B (10ec:8168) using the r8169 driver.
>
> Kernel 3.14:
> # netperf -v 2 -P 0 -H <netserver-IP>,4 -I 99,5 -t omni -l 1 -- -O
> STDDEV_LATENCY -m 64K -d Send
> 313.37
>
> Kernel 4.19:
> # netperf -v 2 -P 0 -H <netserver-IP>,4 -I 99,5 -t omni -l 1 -- -O
> STDDEV_LATENCY -m 64K -d Send
> 632.96
>
> In contrast, we noticed small improvements in performance with other
> non-Realtek network cards (igb, tg3). Which suggested a possible driver
> related bug.
>
> However after bisecting the code, I ended up with the following patch, which
> was introduced in kernel 4.17 and modifies net/ipv4:
>
> commit 0a6b2a1dc2a2105f178255fe495eb914b09cb37a
> Author: Eric Dumazet <[email protected]>
> Date: Mon Feb 19 11:56:47 2018 -0800
>
> tcp: switch to GSO being always on
>
> Could you please help me to clarify, should GSO be always on on my device? Or
> does it just affect TCP? According to ethtool it is always off, "ethtool -K
> eth0 gso on" has no effect, unless I switch SG on.
>
> # ethtool -k eth0
> Offload parameters for eth0:
> Cannot get device udp large send offload settings: Operation not supported
> rx-checksumming: on
> tx-checksumming: off
> scatter-gather: off
> tcp-segmentation-offload: off
> udp-fragmentation-offload: off
> generic-segmentation-offload: off
> generic-receive-offload: on
> large-receive-offload: off
>
> I validated that reverting "tcp: switch to GSO being always on" successfully
> brings back the better performance for the r8169 driver.
>
> I'm sure that reverting that commit is not the optimal solution, so I would
> like to kindly ask for help to shed some light in this issue.
Hi Juliana
I am sure that all commits done in TCP stack can show a regression on a
particular
combination of packet sizes, MTU size, NIC, and measured metric.
Basically if your NIC does not support SG and TSO, there is a possibility
that the changes we did to enter the era of 100Gbit and 200Gbit NIC might
hurt a bit.
Lack of SG means that the lower stack might have to perform memory allocations
to perform the segmentation and this might be slow (or even fail) under memory
pressure.
I have no idea why you can even turn on SG, if it is turned off by default.
Please give us more information on the NIC
ethtool -i eth0 ; ifconfig eth0
Possibly try to use a recent ethtool, it seems yours is pretty old.
I also see this relevant commit : I have no idea why SG would have any relation
with TSO.
commit a7eb6a4f2560d5ae64bfac98d79d11378ca2de6c
Author: Holger Hoffstätte <[email protected]>
Date: Fri Aug 9 00:02:40 2019 +0200
r8169: fix performance issue on RTL8168evl
Disabling TSO but leaving SG active results is a significant
performance drop. Therefore disable also SG on RTL8168evl.
This restores the original performance.
Fixes: 93681cd7d94f ("r8169: enable HW csum and TSO")
Signed-off-by: Holger Hoffstätte <[email protected]>
Signed-off-by: Heiner Kallweit <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
diff --git a/drivers/net/ethernet/realtek/r8169_main.c
b/drivers/net/ethernet/realtek/r8169_main.c
index b2a275d8504c..912bd41eaa1b 100644
--- a/drivers/net/ethernet/realtek/r8169_main.c
+++ b/drivers/net/ethernet/realtek/r8169_main.c
@@ -6898,9 +6898,9 @@ static int rtl_init_one(struct pci_dev *pdev, const
struct pci_device_id *ent)
/* RTL8168e-vl has a HW issue with TSO */
if (tp->mac_version == RTL_GIGA_MAC_VER_34) {
- dev->vlan_features &= ~NETIF_F_ALL_TSO;
- dev->hw_features &= ~NETIF_F_ALL_TSO;
- dev->features &= ~NETIF_F_ALL_TSO;
+ dev->vlan_features &= ~(NETIF_F_ALL_TSO | NETIF_F_SG);
+ dev->hw_features &= ~(NETIF_F_ALL_TSO | NETIF_F_SG);
+ dev->features &= ~(NETIF_F_ALL_TSO | NETIF_F_SG);
}
dev->hw_features |= NETIF_F_RXALL;