This bug is missing log files that will aid in diagnosing the problem.
While running an Ubuntu kernel (not a mainline or third-party kernel)
please enter the following command in a terminal window:

apport-collect 1814095

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable
to run this command, please add a comment stating that fact and change
the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the
Ubuntu Kernel Team.

** Changed in: linux (Ubuntu)
       Status: New => Incomplete

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1814095

Title:
  bnxt_en_po: TX timed out triggering Netdev Watchdog Timer

Status in linux package in Ubuntu:
  Confirmed

Bug description:
  The following 25Gb Broadcom NIC error was seen on Xenial
  running the 4.4.0-141-generic kernel on an amd64 host
  seeing moderate-heavy network traffic (just once):

  * The bnxt_en_po driver froze on a "TX timed out" error
    and triggered the Netdev Watchdog timer under load. 

  * From kernel log:
    "NETDEV WATCHDOG: eno2d1 (bnxt_en_bpo): transmit queue 0 timed out"
    See attached kern.log excerpt file for full excerpt of error log.

  * Release = Xenial 
    Kernel = 4.4.0-141-generic #167
    eno2d1 = Product Name: Broadcom Adv. Dual 25Gb Ethernet
    
  * This caused the driver to reset in order to recover:
    
    "bnxt_en_bpo 0000:19:00.1 eno2d1: TX timeout detected, starting reset task!"
   
    driver: bnxt_en_bpo
    version: 1.8.1
    source: ubuntu/bnxt/bnxt.c: bnxt_tx_timeout()

  * The loss of connectivity and softirq stall caused other failures
    on the system. 

  * The bnxt_en_po driver is the imported Broadcom driver
    pulled in to support newer Broadcom HW (specific boards)
    while the bnx_en module continues to support the older
    HW. The current Linux upstream driver does not compile
    easily with the 4.4 kernel (too many changes). 

  * This upstream and bnxt_en driver fix is a likely solution:
     "bnxt_en: Fix TX timeout during netpoll"
     commit: 73f21c653f930f438d53eed29b5e4c65c8a0f906
    
    This fix has not been applied to the bnxt_en_po driver
    version, but review of the code indicates that it is 
    susceptible to the bug, and the fix would be reasonable. 

  * No easy way to reproduce this

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1814095/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : [email protected]
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to