After updating a server with an Intel 10Gbase-T NIC from linux-4.4.1 to linux-4.6.1 (vanilla, stable) we experienced (after ~2 days of operation) the following bug:
Jun 6 19:09:31 computer kernel: AMD-Vi: Event logged [IO_PAGE_FAULT device=04:00.0 domain=0x000e address=0x000000001004ecc0 flags=0x0050] Jun 6 19:09:31 computer kernel: AMD-Vi: Event logged [IO_PAGE_FAULT device=04:00.0 domain=0x000e address=0x000000001004ed00 flags=0x0050] Jun 6 19:09:35 computer kernel: ixgbe 0000:04:00.0 enp4s0: Detected Tx Unit Hang#012 Tx Queue <3>#012 TDH, TDT <1ce>, <1e6>#012 next_to_use <1e6>#012 next_to_clean <1ce>#012tx_buffer_info[next_to_clean]#012 time_stamp <10f7b215d>#012 jiffies <10f7b3244> Jun 6 19:09:35 computer kernel: ixgbe 0000:04:00.0 enp4s0: Detected Tx Unit Hang#012 Tx Queue <1>#012 TDH, TDT <fc>, <108>#012 next_to_use <108>#012 next_to_clean <fc>#012tx_buffer_info[next_to_clean]#012 time_stamp <10f7b28c5>#012 jiffies <10f7b3244> Jun 6 19:09:35 computer kernel: ixgbe 0000:04:00.0 enp4s0: Detected Tx Unit Hang#012 Tx Queue <0>#012 TDH, TDT <16b>, <16f>#012 next_to_use <16f>#012 next_to_clean <16b>#012tx_buffer_info[next_to_clean]#012 time_stamp <10f7b21d0>#012 jiffies <10f7b3244> Jun 6 19:09:35 computer kernel: ixgbe 0000:04:00.0 enp4s0: Detected Tx Unit Hang#012 Tx Queue <4>#012 TDH, TDT <69>, <8b>#012 next_to_use <8b>#012 next_to_clean <69>#012tx_buffer_info[next_to_clean]#012 time_stamp <10f7b215d>#012 jiffies <10f7b3244> Jun 6 19:09:35 computer kernel: ixgbe 0000:04:00.0 enp4s0: tx hang 1 detected on queue 1, resetting adapter Jun 6 19:09:35 computer kernel: ixgbe 0000:04:00.0 enp4s0: tx hang 1 detected on queue 0, resetting adapter Jun 6 19:09:35 computer kernel: ixgbe 0000:04:00.0 enp4s0: Detected Tx Unit Hang#012 Tx Queue <10>#012 TDH, TDT <1c3>, <1c9>#012 next_to_use <1c9>#012 next_to_clean <1c3>#012tx_buffer_info[next_to_clean]#012 time_stamp <10f7b215d>#012 jiffies <10f7b3244> Jun 6 19:09:35 computer kernel: ixgbe 0000:04:00.0 enp4s0: tx hang 1 detected on queue 4, resetting adapter Jun 6 19:09:35 computer kernel: ixgbe 0000:04:00.0 enp4s0: initiating reset due to tx timeout Jun 6 19:09:35 computer kernel: ixgbe 0000:04:00.0 enp4s0: initiating reset due to tx timeout Jun 6 19:09:35 computer kernel: ixgbe 0000:04:00.0 enp4s0: tx hang 1 detected on queue 10, resetting adapter Jun 6 19:09:35 computer kernel: ixgbe 0000:04:00.0 enp4s0: initiating reset due to tx timeout Jun 6 19:09:35 computer kernel: ixgbe 0000:04:00.0 enp4s0: initiating reset due to tx timeout Jun 6 19:09:35 computer kernel: ixgbe 0000:04:00.0 enp4s0: Reset adapter Jun 6 19:09:35 computer kernel: ixgbe 0000:04:00.0 enp4s0: tx hang 2 detected on queue 3, resetting adapter Jun 6 19:09:35 computer kernel: ixgbe 0000:04:00.0: master disable timed out Jun 6 19:09:36 computer kernel: br0: port 1(enp4s0) entered disabled state Jun 6 19:09:42 computer kernel: ixgbe 0000:04:00.0 enp4s0: NIC Link is Up 10 Gbps, Flow Control: RX/TX Jun 6 19:09:42 computer kernel: br0: port 1(enp4s0) entered blocking state Jun 6 19:09:42 computer kernel: br0: port 1(enp4s0) entered forwarding state Jun 6 19:09:44 computer kernel: ixgbe 0000:04:00.0 enp4s0: Detected Tx Unit Hang#012 Tx Queue <12>#012 TDH, TDT <0>, <2>#012 next_to_use <2>#012 next_to_clean <0>#012tx_buffer_info[next_to_clean]#012 time_stamp <10f7b4c20>#012 jiffies <10f7b544c> Jun 6 19:09:44 computer kernel: ixgbe 0000:04:00.0 enp4s0: tx hang 2 detected on queue 12, resetting adapter Jun 6 19:09:44 computer kernel: ixgbe 0000:04:00.0 enp4s0: initiating reset due to tx timeout Jun 6 19:09:44 computer kernel: ixgbe 0000:04:00.0 enp4s0: Reset adapter Jun 6 19:09:44 computer kernel: ixgbe 0000:04:00.0 enp4s0: RXDCTL.ENABLE on Rx queue 0 not cleared within the polling period Jun 6 19:09:44 computer kernel: ixgbe 0000:04:00.0 enp4s0: RXDCTL.ENABLE on Rx queue 1 not cleared within the polling period Jun 6 19:09:44 computer kernel: ixgbe 0000:04:00.0 enp4s0: RXDCTL.ENABLE on Rx queue 2 not cleared within the polling period Jun 6 19:09:44 computer kernel: ixgbe 0000:04:00.0 enp4s0: RXDCTL.ENABLE on Rx queue 3 not cleared within the polling period Jun 6 19:09:44 computer kernel: ixgbe 0000:04:00.0 enp4s0: RXDCTL.ENABLE on Rx queue 4 not cleared within the polling period Jun 6 19:09:44 computer kernel: ixgbe 0000:04:00.0 enp4s0: RXDCTL.ENABLE on Rx queue 5 not cleared within the polling period Jun 6 19:09:44 computer kernel: ixgbe 0000:04:00.0 enp4s0: RXDCTL.ENABLE on Rx queue 6 not cleared within the polling period Jun 6 19:09:44 computer kernel: ixgbe 0000:04:00.0 enp4s0: RXDCTL.ENABLE on Rx queue 7 not cleared within the polling period Jun 6 19:09:44 computer kernel: ixgbe 0000:04:00.0 enp4s0: RXDCTL.ENABLE on Rx queue 8 not cleared within the polling period Jun 6 19:09:44 computer kernel: ixgbe 0000:04:00.0 enp4s0: RXDCTL.ENABLE on Rx queue 9 not cleared within the polling period Jun 6 19:09:44 computer kernel: ixgbe 0000:04:00.0 enp4s0: RXDCTL.ENABLE on Rx queue 10 not cleared within the polling period Jun 6 19:09:44 computer kernel: ixgbe 0000:04:00.0 enp4s0: RXDCTL.ENABLE on Rx queue 11 not cleared within the polling period Jun 6 19:09:44 computer kernel: ixgbe 0000:04:00.0 enp4s0: RXDCTL.ENABLE on Rx queue 12 not cleared within the polling period Jun 6 19:09:44 computer kernel: ixgbe 0000:04:00.0 enp4s0: RXDCTL.ENABLE on Rx queue 13 not cleared within the polling period Jun 6 19:09:44 computer kernel: ixgbe 0000:04:00.0 enp4s0: RXDCTL.ENABLE on Rx queue 14 not cleared within the polling period Jun 6 19:09:44 computer kernel: ixgbe 0000:04:00.0 enp4s0: RXDCTL.ENABLE on Rx queue 15 not cleared within the polling period Jun 6 19:09:44 computer kernel: ixgbe 0000:04:00.0 enp4s0: RXDCTL.ENABLE on Rx queue 16 not cleared within the polling period Jun 6 19:09:44 computer kernel: ixgbe 0000:04:00.0 enp4s0: RXDCTL.ENABLE on Rx queue 17 not cleared within the polling period Jun 6 19:09:44 computer kernel: ixgbe 0000:04:00.0 enp4s0: RXDCTL.ENABLE on Rx queue 18 not cleared within the polling period Jun 6 19:09:44 computer kernel: ixgbe 0000:04:00.0 enp4s0: RXDCTL.ENABLE on Rx queue 19 not cleared within the polling period Jun 6 19:09:44 computer kernel: ixgbe 0000:04:00.0 enp4s0: RXDCTL.ENABLE on Rx queue 20 not cleared within the polling period Jun 6 19:09:44 computer kernel: ixgbe 0000:04:00.0 enp4s0: RXDCTL.ENABLE on Rx queue 21 not cleared within the polling period Jun 6 19:09:44 computer kernel: ixgbe 0000:04:00.0 enp4s0: RXDCTL.ENABLE on Rx queue 22 not cleared within the polling period Jun 6 19:09:44 computer kernel: ixgbe 0000:04:00.0 enp4s0: RXDCTL.ENABLE on Rx queue 23 not cleared within the polling period Jun 6 19:09:44 computer kernel: ixgbe 0000:04:00.0 enp4s0: RXDCTL.ENABLE on Rx queue 24 not cleared within the polling period Jun 6 19:09:44 computer kernel: ixgbe 0000:04:00.0 enp4s0: RXDCTL.ENABLE on Rx queue 25 not cleared within the polling period Jun 6 19:09:44 computer kernel: ixgbe 0000:04:00.0 enp4s0: RXDCTL.ENABLE on Rx queue 26 not cleared within the polling period Jun 6 19:09:44 computer kernel: ixgbe 0000:04:00.0 enp4s0: RXDCTL.ENABLE on Rx queue 27 not cleared within the polling period Jun 6 19:09:44 computer kernel: ixgbe 0000:04:00.0 enp4s0: RXDCTL.ENABLE on Rx queue 28 not cleared within the polling period Jun 6 19:09:44 computer kernel: ixgbe 0000:04:00.0 enp4s0: RXDCTL.ENABLE on Rx queue 29 not cleared within the polling period Jun 6 19:09:44 computer kernel: ixgbe 0000:04:00.0 enp4s0: RXDCTL.ENABLE on Rx queue 30 not cleared within the polling period Jun 6 19:09:44 computer kernel: ixgbe 0000:04:00.0 enp4s0: RXDCTL.ENABLE on Rx queue 31 not cleared within the polling period Jun 6 19:09:44 computer kernel: ixgbe 0000:04:00.0: master disable timed out Jun 6 19:09:44 computer kernel: ixgbe 0000:04:00.0 enp4s0: RXDCTL.ENABLE on Rx queue 0 not cleared within the polling period ... Jun 6 19:09:44 computer kernel: ixgbe 0000:04:00.0 enp4s0: RXDCTL.ENABLE on Rx queue 31 not cleared within the polling period Jun 6 19:09:45 computer kernel: br0: port 1(enp4s0) entered disabled state Jun 6 19:09:50 computer kernel: ixgbe 0000:04:00.0 enp4s0: NIC Link is Up 10 Gbps, Flow Control: RX/TX Jun 6 19:09:50 computer kernel: br0: port 1(enp4s0) entered blocking state Jun 6 19:09:50 computer kernel: br0: port 1(enp4s0) entered forwarding state Jun 6 19:09:53 computer kernel: ixgbe 0000:04:00.0 enp4s0: Detected Tx Unit Hang#012 Tx Queue <24>#012 TDH, TDT <0>, <5>#012 next_to_use <5>#012 next_to_clean <0>#012tx_buffer_info[next_to_clean]#012 time_stamp <10f7b6e20>#012 jiffies <10f7b767c> Jun 6 19:09:53 computer kernel: ixgbe 0000:04:00.0 enp4s0: tx hang 3 detected on queue 24, resetting adapter Jun 6 19:09:53 computer kernel: ixgbe 0000:04:00.0 enp4s0: initiating reset due to tx timeout Jun 6 19:09:53 computer kernel: ixgbe 0000:04:00.0 enp4s0: Reset adapter Jun 6 19:09:53 computer kernel: ixgbe 0000:04:00.0 enp4s0: RXDCTL.ENABLE on Rx queue 0 not cleared within the polling period ... Jun 6 19:09:53 computer kernel: ixgbe 0000:04:00.0 enp4s0: RXDCTL.ENABLE on Rx queue 31 not cleared within the polling period Jun 6 19:09:53 computer kernel: ixgbe 0000:04:00.0: master disable timed out The ixgbe module was not able to restore the link after this, only "rmmod" plus new initialization of the interface restored connectivity. Any idea what's going wrong, here? Regards, Lutz Vieweg ------------------------------------------------------------------------------ What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic patterns at an interface-level. Reveals which users, apps, and protocols are consuming the most bandwidth. Provides multi-vendor support for NetFlow, J-Flow, sFlow and other flows. Make informed decisions using capacity planning reports. https://ad.doubleclick.net/ddm/clk/305295220;132659582;e _______________________________________________ E1000-devel mailing list E1000-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/e1000-devel To learn more about Intel® Ethernet, visit http://communities.intel.com/community/wired