2018-08-22 5:19 GMT+08:00 Heiner Kallweit <hkallwe...@gmail.com>: > On 20.08.2018 05:47, Jian-Hong Pan wrote: >> 2018-08-20 4:34 GMT+08:00 Heiner Kallweit <hkallwe...@gmail.com>: >>> The three of you reported an MSI-X-related error when the system >>> resumes from suspend. This has been fixed for now by disabling MSI-X >>> on certain chip versions. However more versions may be affected. >>> >>> I checked with Realtek and they confirmed that on certain chip >>> versions a MSIX-related value in PCI config space is reset when >>> resuming from S3. >>> >>> I would appreciate if you could test the following experimental patch >>> and whether warning "MSIX address lost, re-configuring" appears in >>> your dmesg output after resume from suspend. >>> >>> Thanks a lot for your efforts. >> >> Tested with the experiment patch on ASUS X441UAR. >> >> This is the information before suspend: >> >> dev@endless:~$ dmesg | grep r8169 >> [ 10.279565] libphy: r8169: probed >> [ 10.279947] r8169 0000:02:00.0 eth0: RTL8106e, 0c:9d:92:32:67:b4, >> XID 44900000, IRQ 127 >> [ 10.445952] r8169 0000:02:00.0 enp2s0: renamed from eth0 >> [ 15.676229] Generic PHY r8169-200:00: attached PHY driver [Generic >> PHY] (mii_bus:phy_addr=r8169-200:00, irq=IGNORE) >> [ 17.455392] r8169 0000:02:00.0 enp2s0: Link is Up - 100Mbps/Full - >> flow control off >> >> dev@endless:~$ ip addr show enp2s0 >> 4: enp2s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast >> state UP group default qlen 1000 >> link/ether 0c:9d:92:32:67:b4 brd ff:ff:ff:ff:ff:ff >> inet 10.100.13.152/24 brd 10.100.13.255 scope global noprefixroute >> dynamic enp2s0 >> valid_lft 86347sec preferred_lft 86347sec >> inet6 fe80::2873:a2a9:6ca1:c79d/64 scope link noprefixroute >> valid_lft forever preferred_lft forever >> >> This is the information after resume: >> >> dev@endless:~$ dmesg | grep r8169 >> [ 10.279565] libphy: r8169: probed >> [ 10.279947] r8169 0000:02:00.0 eth0: RTL8106e, 0c:9d:92:32:67:b4, >> XID 44900000, IRQ 127 >> [ 10.445952] r8169 0000:02:00.0 enp2s0: renamed from eth0 >> [ 15.676229] Generic PHY r8169-200:00: attached PHY driver [Generic >> PHY] (mii_bus:phy_addr=r8169-200:00, irq=IGNORE) >> [ 17.455392] r8169 0000:02:00.0 enp2s0: Link is Up - 100Mbps/Full - >> flow control off >> [ 95.594265] r8169 0000:02:00.0 enp2s0: Link is Down >> [ 96.242074] Generic PHY r8169-200:00: attached PHY driver [Generic >> PHY] (mii_bus:phy_addr=r8169-200:00, irq=IGNORE) >> >> dev@endless:~$ ip addr show enp2s0 >> 4: enp2s0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc >> pfifo_fast state DOWN group default qlen 1000 >> link/ether 0c:9d:92:32:67:b4 brd ff:ff:ff:ff:ff:ff >> >> There is no "MSIX address lost, re-configuring" in dmesg. >> The ethernet interface is still down after resume. >> > > Thanks a lot for testing. Unfortunately I don't have test hardware > affected by this MSI-X issue, so maybe you can help me to understand > the issue a little better. > > Below is a patch printing the MSI-X table entry in different contexts, > it's not supposed to fix anything. Could you please let me know > what the output is on your system? > I want to get an idea whether the issue clears the complete entry or > just corrupts certain parts.
Here is the test result on ASUS X441UAR with this patch: dev@endless:~$ dmesg | grep -E "(r8169|enp2s0)" [ 8.980001] r8169 0000:02:00.0: MSI-X entry: context probe: fee01004 0 40ef 1 [ 8.981594] libphy: r8169: probed [ 8.981769] r8169 0000:02:00.0 eth0: RTL8106e, 0c:9d:92:32:67:b4, XID 44900000, IRQ 127 [ 9.479848] r8169 0000:02:00.0 enp2s0: renamed from eth0 [ 11.332834] IPv6: ADDRCONF(NETDEV_UP): enp2s0: link is not ready [ 11.336350] Generic PHY r8169-200:00: attached PHY driver [Generic PHY] (mii_bus:phy_addr=r8169-200:00, irq=IGNORE) [ 11.574892] IPv6: ADDRCONF(NETDEV_UP): enp2s0: link is not ready [ 11.581816] r8169 0000:02:00.0 enp2s0: Link is Down [ 13.190535] r8169 0000:02:00.0 enp2s0: Link is Up - 100Mbps/Full - flow control off [ 13.190548] IPv6: ADDRCONF(NETDEV_CHANGE): enp2s0: link becomes ready [ 56.227974] r8169 0000:02:00.0: MSI-X entry: context suspend: fee04004 0 4024 0 [ 56.462464] r8169 0000:02:00.0: MSI-X entry: context resume: ffffffff ffffffff ffffffff ffffffff [ 58.406713] r8169 0000:02:00.0 enp2s0: Link is Up - 100Mbps/Full - flow control off [ 58.766740] IPv6: ADDRCONF(NETDEV_UP): enp2s0: link is not ready [ 58.767331] Generic PHY r8169-200:00: attached PHY driver [Generic PHY] (mii_bus:phy_addr=r8169-200:00, irq=IGNORE) [ 59.003660] IPv6: ADDRCONF(NETDEV_UP): enp2s0: link is not ready uh! The MSI-X entry seems missed after resume on this laptop! Ethernet interface status after resume: dev@endless:~$ ip addr show enp2s0 3: enp2s0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast state DOWN group default qlen 1000 link/ether 0c:9d:92:32:67:b4 brd ff:ff:ff:ff:ff:ff Regards, Jian-Hong Pan > That's what I get on my system (RTL8168E-VL). In your case you'll come > only till the first suspend. > > [ 3.743404] r8169 0000:03:00.0: MSI-X entry: context probe: fee01004 0 > 40ef 1 > [ 29.539250] r8169 0000:03:00.0: MSI-X entry: context suspend: fee02004 0 > 4028 0 > [ 29.837457] r8169 0000:03:00.0: MSI-X entry: context resume: fee01004 0 > 402b 0 > [ 36.921370] r8169 0000:03:00.0: MSI-X entry: context suspend: fee01004 0 > 402b 0 > [ 37.239407] r8169 0000:03:00.0: MSI-X entry: context resume: fee01004 0 > 402b 0 > > diff --git a/drivers/net/ethernet/realtek/r8169.c > b/drivers/net/ethernet/realtek/r8169.c > index 54f53c8c0..f32645119 100644 > --- a/drivers/net/ethernet/realtek/r8169.c > +++ b/drivers/net/ethernet/realtek/r8169.c > @@ -11,6 +11,7 @@ > #include <linux/module.h> > #include <linux/moduleparam.h> > #include <linux/pci.h> > +#include <linux/msi.h> > #include <linux/netdevice.h> > #include <linux/etherdevice.h> > #include <linux/delay.h> > @@ -6822,6 +6823,20 @@ rtl8169_get_stats64(struct net_device *dev, struct > rtnl_link_stats64 *stats) > pm_runtime_put_noidle(&pdev->dev); > } > > +static void rtl_print_msix_entry(struct rtl8169_private *tp, const char > *context) > +{ > + struct msi_desc *desc = first_pci_msi_entry(tp->pci_dev); > + u32 data[4]; > + > + data[0] = readl(desc->mask_base + PCI_MSIX_ENTRY_LOWER_ADDR); > + data[1] = readl(desc->mask_base + PCI_MSIX_ENTRY_UPPER_ADDR); > + data[2] = readl(desc->mask_base + PCI_MSIX_ENTRY_DATA); > + data[3] = readl(desc->mask_base + PCI_MSIX_ENTRY_VECTOR_CTRL); > + > + dev_info(tp_to_dev(tp), "MSI-X entry: context %s: %x %x %x %x\n", > + context, data[0], data[1], data[2], data[3]); > +} > + > static void rtl8169_net_suspend(struct net_device *dev) > { > struct rtl8169_private *tp = netdev_priv(dev); > @@ -6846,9 +6861,12 @@ static int rtl8169_suspend(struct device *device) > { > struct pci_dev *pdev = to_pci_dev(device); > struct net_device *dev = pci_get_drvdata(pdev); > + struct rtl8169_private *tp = netdev_priv(dev); > > rtl8169_net_suspend(dev); > > + rtl_print_msix_entry(tp, "suspend"); > + > return 0; > } > > @@ -6875,6 +6893,9 @@ static int rtl8169_resume(struct device *device) > { > struct pci_dev *pdev = to_pci_dev(device); > struct net_device *dev = pci_get_drvdata(pdev); > + struct rtl8169_private *tp = netdev_priv(dev); > + > + rtl_print_msix_entry(tp, "resume"); > > if (netif_running(dev)) > __rtl8169_resume(dev); > @@ -7075,11 +7096,6 @@ static int rtl_alloc_irq(struct rtl8169_private *tp) > RTL_W8(tp, Config2, RTL_R8(tp, Config2) & ~MSIEnable); > RTL_W8(tp, Cfg9346, Cfg9346_Lock); > flags = PCI_IRQ_LEGACY; > - } else if (tp->mac_version == RTL_GIGA_MAC_VER_40) { > - /* This version was reported to have issues with resume > - * from suspend when using MSI-X > - */ > - flags = PCI_IRQ_LEGACY | PCI_IRQ_MSI; > } else { > flags = PCI_IRQ_ALL_TYPES; > } > @@ -7354,6 +7370,8 @@ static int rtl_init_one(struct pci_dev *pdev, const > struct pci_device_id *ent) > return rc; > } > > + rtl_print_msix_entry(tp, "probe"); > + > tp->saved_wolopts = __rtl8169_get_wol(tp); > > mutex_init(&tp->wk.mutex); > -- > 2.18.0 >