This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed- bionic' to 'verification-done-bionic'. If the problem still exists, change the tag 'verification-needed-bionic' to 'verification-failed- bionic'.
If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you! ** Tags added: verification-needed-bionic -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1815033 Title: qlcnic: Firmware aborts/hangs in QLogic NIC Status in linux package in Ubuntu: Confirmed Status in linux source package in Bionic: Fix Committed Bug description: [Impact] * In multi-queue configurations for qlcnic driver, there is a corner case in which TX queue zero is used at same time for regular data transmission by one CPU while another uses the same queue descriptor for MAC configuration. * When such "race" indeed happens, it could lead to TX queue zero corruption, triggering as net result firmware aborts/hangs out of nowhere. The following kernel log messages were collected during the corruption event: qlcnic 0000:01:00.0: Pause control frames disabled on all ports qlcnic 0000:01:00.0: firmware hang detected qlcnic 0000:01:00.0: Dumping hw/fw registers PEG_HALT_STATUS1: 0x40001502, PEG_HALT_STATUS2: 0x3de7a0, PEG_NET_0_PC: 0x6d268, PEG_NET_1_PC: 0x6d2ac, PEG_NET_2_PC: 0x149, PEG_NET_3_PC: 0x6e105, PEG_NET_4_PC: 0x1e00b [...] qlcnic 0000:01:00.0: Detected state change from DEV_NEED_RESET, skipping ack check * The following device is known to suffer from the issue (lspci output), although a whole class of devices (named 82XX series from the vendor) are susceptible to this: 01:00.0 Ethernet controller : QLogic Corp. cLOM8214 1/10GbE Controller [1077:8020] * The fix is the following patch, present in mainline kernel as well as in supported stable branches: c333fa0c4f22 ("qlcnic: fix Tx descriptor corruption on 82xx devices"). Link for the patch in Linus tree: http://git.kernel.org/linus/c333fa0c4f22 [Test Case] * Unfortunately this is not easy to reproduce; we have a user report of the issue with a pretty reliable reproducer - user is running a NFS workload on top of the above PCI adapter. His problem goes away with the patch proposed here to SRU. His problem happens in both kernels 4.4 and 4.15, and the patch fixes it for both of them. (Notice this is a Bionic-only SRU, since Ubuntu 4.4 kernel got the patch from Greg's supported stable branch). [Regression Potential] * The patch scope is restricted to a single driver, and the code itself is self-contained - basically a restriction to specific tx_ring when setting filters. There is potential for regressions in this path for the driver which could cause different firmware issues for example, but the user testing exhibited great reliability - without the patch issue happens after ~6h of machine boot. With the patch the machine ran for more than 8 days without issues. * Also the patch is present in mainline kernel as well as supported stable branches, and is already present in Ubuntu 4.4 kernel. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1815033/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : email@example.com Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp