After updating a server with an Intel 10Gbase-T NIC from linux-4.4.1 to
linux-4.6.1 (vanilla, stable) we experienced (after ~2 days of operation)
the following bug:

Jun  6 19:09:31 computer kernel: AMD-Vi: Event logged [IO_PAGE_FAULT
device=04:00.0 domain=0x000e address=0x000000001004ecc0 flags=0x0050]
Jun  6 19:09:31 computer kernel: AMD-Vi: Event logged [IO_PAGE_FAULT
device=04:00.0 domain=0x000e address=0x000000001004ed00 flags=0x0050]
Jun  6 19:09:35 computer kernel: ixgbe 0000:04:00.0 enp4s0: Detected Tx Unit
Hang#012  Tx Queue             <3>#012  TDH, TDT             <1ce>,
<1e6>#012  next_to_use          <1e6>#012  next_to_clean       
<1ce>#012tx_buffer_info[next_to_clean]#012  time_stamp          
<10f7b215d>#012  jiffies              <10f7b3244>
Jun  6 19:09:35 computer kernel: ixgbe 0000:04:00.0 enp4s0: Detected Tx Unit
Hang#012  Tx Queue             <1>#012  TDH, TDT             <fc>, <108>#012
 next_to_use          <108>#012  next_to_clean       
<fc>#012tx_buffer_info[next_to_clean]#012  time_stamp          
<10f7b28c5>#012  jiffies              <10f7b3244>
Jun  6 19:09:35 computer kernel: ixgbe 0000:04:00.0 enp4s0: Detected Tx Unit
Hang#012  Tx Queue             <0>#012  TDH, TDT             <16b>,
<16f>#012  next_to_use          <16f>#012  next_to_clean       
<16b>#012tx_buffer_info[next_to_clean]#012  time_stamp          
<10f7b21d0>#012  jiffies              <10f7b3244>
Jun  6 19:09:35 computer kernel: ixgbe 0000:04:00.0 enp4s0: Detected Tx Unit
Hang#012  Tx Queue             <4>#012  TDH, TDT             <69>, <8b>#012
 next_to_use          <8b>#012  next_to_clean       
<69>#012tx_buffer_info[next_to_clean]#012  time_stamp          
<10f7b215d>#012  jiffies              <10f7b3244>
Jun  6 19:09:35 computer kernel: ixgbe 0000:04:00.0 enp4s0: tx hang 1
detected on queue 1, resetting adapter
Jun  6 19:09:35 computer kernel: ixgbe 0000:04:00.0 enp4s0: tx hang 1
detected on queue 0, resetting adapter
Jun  6 19:09:35 computer kernel: ixgbe 0000:04:00.0 enp4s0: Detected Tx Unit
Hang#012  Tx Queue             <10>#012  TDH, TDT             <1c3>,
<1c9>#012  next_to_use          <1c9>#012  next_to_clean       
<1c3>#012tx_buffer_info[next_to_clean]#012  time_stamp          
<10f7b215d>#012  jiffies              <10f7b3244>
Jun  6 19:09:35 computer kernel: ixgbe 0000:04:00.0 enp4s0: tx hang 1
detected on queue 4, resetting adapter
Jun  6 19:09:35 computer kernel: ixgbe 0000:04:00.0 enp4s0: initiating reset
due to tx timeout
Jun  6 19:09:35 computer kernel: ixgbe 0000:04:00.0 enp4s0: initiating reset
due to tx timeout
Jun  6 19:09:35 computer kernel: ixgbe 0000:04:00.0 enp4s0: tx hang 1
detected on queue 10, resetting adapter
Jun  6 19:09:35 computer kernel: ixgbe 0000:04:00.0 enp4s0: initiating reset
due to tx timeout
Jun  6 19:09:35 computer kernel: ixgbe 0000:04:00.0 enp4s0: initiating reset
due to tx timeout
Jun  6 19:09:35 computer kernel: ixgbe 0000:04:00.0 enp4s0: Reset adapter
Jun  6 19:09:35 computer kernel: ixgbe 0000:04:00.0 enp4s0: tx hang 2
detected on queue 3, resetting adapter
Jun  6 19:09:35 computer kernel: ixgbe 0000:04:00.0: master disable timed out
Jun  6 19:09:36 computer kernel: br0: port 1(enp4s0) entered disabled state
Jun  6 19:09:42 computer kernel: ixgbe 0000:04:00.0 enp4s0: NIC Link is Up
10 Gbps, Flow Control: RX/TX
Jun  6 19:09:42 computer kernel: br0: port 1(enp4s0) entered blocking state
Jun  6 19:09:42 computer kernel: br0: port 1(enp4s0) entered forwarding state
Jun  6 19:09:44 computer kernel: ixgbe 0000:04:00.0 enp4s0: Detected Tx Unit
Hang#012  Tx Queue             <12>#012  TDH, TDT             <0>, <2>#012 
next_to_use          <2>#012  next_to_clean       
<0>#012tx_buffer_info[next_to_clean]#012  time_stamp          
<10f7b4c20>#012  jiffies              <10f7b544c>
Jun  6 19:09:44 computer kernel: ixgbe 0000:04:00.0 enp4s0: tx hang 2
detected on queue 12, resetting adapter
Jun  6 19:09:44 computer kernel: ixgbe 0000:04:00.0 enp4s0: initiating reset
due to tx timeout
Jun  6 19:09:44 computer kernel: ixgbe 0000:04:00.0 enp4s0: Reset adapter
Jun  6 19:09:44 computer kernel: ixgbe 0000:04:00.0 enp4s0: RXDCTL.ENABLE on
Rx queue 0 not cleared within the polling period
Jun  6 19:09:44 computer kernel: ixgbe 0000:04:00.0 enp4s0: RXDCTL.ENABLE on
Rx queue 1 not cleared within the polling period
Jun  6 19:09:44 computer kernel: ixgbe 0000:04:00.0 enp4s0: RXDCTL.ENABLE on
Rx queue 2 not cleared within the polling period
Jun  6 19:09:44 computer kernel: ixgbe 0000:04:00.0 enp4s0: RXDCTL.ENABLE on
Rx queue 3 not cleared within the polling period
Jun  6 19:09:44 computer kernel: ixgbe 0000:04:00.0 enp4s0: RXDCTL.ENABLE on
Rx queue 4 not cleared within the polling period
Jun  6 19:09:44 computer kernel: ixgbe 0000:04:00.0 enp4s0: RXDCTL.ENABLE on
Rx queue 5 not cleared within the polling period
Jun  6 19:09:44 computer kernel: ixgbe 0000:04:00.0 enp4s0: RXDCTL.ENABLE on
Rx queue 6 not cleared within the polling period
Jun  6 19:09:44 computer kernel: ixgbe 0000:04:00.0 enp4s0: RXDCTL.ENABLE on
Rx queue 7 not cleared within the polling period
Jun  6 19:09:44 computer kernel: ixgbe 0000:04:00.0 enp4s0: RXDCTL.ENABLE on
Rx queue 8 not cleared within the polling period
Jun  6 19:09:44 computer kernel: ixgbe 0000:04:00.0 enp4s0: RXDCTL.ENABLE on
Rx queue 9 not cleared within the polling period
Jun  6 19:09:44 computer kernel: ixgbe 0000:04:00.0 enp4s0: RXDCTL.ENABLE on
Rx queue 10 not cleared within the polling period
Jun  6 19:09:44 computer kernel: ixgbe 0000:04:00.0 enp4s0: RXDCTL.ENABLE on
Rx queue 11 not cleared within the polling period
Jun  6 19:09:44 computer kernel: ixgbe 0000:04:00.0 enp4s0: RXDCTL.ENABLE on
Rx queue 12 not cleared within the polling period
Jun  6 19:09:44 computer kernel: ixgbe 0000:04:00.0 enp4s0: RXDCTL.ENABLE on
Rx queue 13 not cleared within the polling period
Jun  6 19:09:44 computer kernel: ixgbe 0000:04:00.0 enp4s0: RXDCTL.ENABLE on
Rx queue 14 not cleared within the polling period
Jun  6 19:09:44 computer kernel: ixgbe 0000:04:00.0 enp4s0: RXDCTL.ENABLE on
Rx queue 15 not cleared within the polling period
Jun  6 19:09:44 computer kernel: ixgbe 0000:04:00.0 enp4s0: RXDCTL.ENABLE on
Rx queue 16 not cleared within the polling period
Jun  6 19:09:44 computer kernel: ixgbe 0000:04:00.0 enp4s0: RXDCTL.ENABLE on
Rx queue 17 not cleared within the polling period
Jun  6 19:09:44 computer kernel: ixgbe 0000:04:00.0 enp4s0: RXDCTL.ENABLE on
Rx queue 18 not cleared within the polling period
Jun  6 19:09:44 computer kernel: ixgbe 0000:04:00.0 enp4s0: RXDCTL.ENABLE on
Rx queue 19 not cleared within the polling period
Jun  6 19:09:44 computer kernel: ixgbe 0000:04:00.0 enp4s0: RXDCTL.ENABLE on
Rx queue 20 not cleared within the polling period
Jun  6 19:09:44 computer kernel: ixgbe 0000:04:00.0 enp4s0: RXDCTL.ENABLE on
Rx queue 21 not cleared within the polling period
Jun  6 19:09:44 computer kernel: ixgbe 0000:04:00.0 enp4s0: RXDCTL.ENABLE on
Rx queue 22 not cleared within the polling period
Jun  6 19:09:44 computer kernel: ixgbe 0000:04:00.0 enp4s0: RXDCTL.ENABLE on
Rx queue 23 not cleared within the polling period
Jun  6 19:09:44 computer kernel: ixgbe 0000:04:00.0 enp4s0: RXDCTL.ENABLE on
Rx queue 24 not cleared within the polling period
Jun  6 19:09:44 computer kernel: ixgbe 0000:04:00.0 enp4s0: RXDCTL.ENABLE on
Rx queue 25 not cleared within the polling period
Jun  6 19:09:44 computer kernel: ixgbe 0000:04:00.0 enp4s0: RXDCTL.ENABLE on
Rx queue 26 not cleared within the polling period
Jun  6 19:09:44 computer kernel: ixgbe 0000:04:00.0 enp4s0: RXDCTL.ENABLE on
Rx queue 27 not cleared within the polling period
Jun  6 19:09:44 computer kernel: ixgbe 0000:04:00.0 enp4s0: RXDCTL.ENABLE on
Rx queue 28 not cleared within the polling period
Jun  6 19:09:44 computer kernel: ixgbe 0000:04:00.0 enp4s0: RXDCTL.ENABLE on
Rx queue 29 not cleared within the polling period
Jun  6 19:09:44 computer kernel: ixgbe 0000:04:00.0 enp4s0: RXDCTL.ENABLE on
Rx queue 30 not cleared within the polling period
Jun  6 19:09:44 computer kernel: ixgbe 0000:04:00.0 enp4s0: RXDCTL.ENABLE on
Rx queue 31 not cleared within the polling period
Jun  6 19:09:44 computer kernel: ixgbe 0000:04:00.0: master disable timed out
Jun  6 19:09:44 computer kernel: ixgbe 0000:04:00.0 enp4s0: RXDCTL.ENABLE on
Rx queue 0 not cleared within the polling period
...
Jun  6 19:09:44 computer kernel: ixgbe 0000:04:00.0 enp4s0: RXDCTL.ENABLE on
Rx queue 31 not cleared within the polling period
Jun  6 19:09:45 computer kernel: br0: port 1(enp4s0) entered disabled state
Jun  6 19:09:50 computer kernel: ixgbe 0000:04:00.0 enp4s0: NIC Link is Up
10 Gbps, Flow Control: RX/TX
Jun  6 19:09:50 computer kernel: br0: port 1(enp4s0) entered blocking state
Jun  6 19:09:50 computer kernel: br0: port 1(enp4s0) entered forwarding state
Jun  6 19:09:53 computer kernel: ixgbe 0000:04:00.0 enp4s0: Detected Tx Unit
Hang#012  Tx Queue             <24>#012  TDH, TDT             <0>, <5>#012 
next_to_use          <5>#012  next_to_clean       
<0>#012tx_buffer_info[next_to_clean]#012  time_stamp          
<10f7b6e20>#012  jiffies              <10f7b767c>
Jun  6 19:09:53 computer kernel: ixgbe 0000:04:00.0 enp4s0: tx hang 3
detected on queue 24, resetting adapter
Jun  6 19:09:53 computer kernel: ixgbe 0000:04:00.0 enp4s0: initiating reset
due to tx timeout
Jun  6 19:09:53 computer kernel: ixgbe 0000:04:00.0 enp4s0: Reset adapter
Jun  6 19:09:53 computer kernel: ixgbe 0000:04:00.0 enp4s0: RXDCTL.ENABLE on
Rx queue 0 not cleared within the polling period
...
Jun  6 19:09:53 computer kernel: ixgbe 0000:04:00.0 enp4s0: RXDCTL.ENABLE on
Rx queue 31 not cleared within the polling period
Jun  6 19:09:53 computer kernel: ixgbe 0000:04:00.0: master disable timed out


The ixgbe module was not able to restore the link after this, only "rmmod"
plus new initialization of the interface restored connectivity.

Any idea what's going wrong, here?

Regards,

Lutz Vieweg



------------------------------------------------------------------------------
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are 
consuming the most bandwidth. Provides multi-vendor support for NetFlow, 
J-Flow, sFlow and other flows. Make informed decisions using capacity 
planning reports. https://ad.doubleclick.net/ddm/clk/305295220;132659582;e
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit 
http://communities.intel.com/community/wired

Reply via email to