This bug is awaiting verification that the kernel in -proposed solves
the problem. Please test the kernel and update this bug with the
results. If the problem is solved, change the tag 'verification-needed-
xenial' to 'verification-done-xenial'. If the problem still exists,
change the tag 'verification-needed-xenial' to 'verification-failed-
xenial'.

If verification is not done by 5 working days from today, this fix will
be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how
to enable and use -proposed. Thank you!


** Tags added: verification-needed-xenial

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1832082

Title:
  bnx2x driver causes 100% CPU load

Status in linux package in Ubuntu:
  Fix Committed
Status in linux source package in Xenial:
  Fix Committed
Status in linux source package in Bionic:
  Fix Committed
Status in linux source package in Cosmic:
  Won't Fix
Status in linux source package in Disco:
  Fix Committed
Status in linux source package in Eoan:
  Fix Committed
Status in linux source package in FF-Series:
  Fix Committed

Bug description:
  [Impact]

  * The PTP feature in bnx2x driver is implemented in a way that if the
  NIC firmware takes some time to perform the timestamping - which is
  observed as a bad register read in bnx2x_ptp_task() - then the ptp
  worker function will reschedule itself indefinitely until the value
  read from the register is meaningful. With that behavior, if an
  userspace tool request a bad configured RX filter to bnx2x (or if NIC
  firmware has any other issue in timestamping), the function
  bnx2x_ptp_task() will be rescheduled forever and cause a unbound
  resource consumption. This manifests as a kworker thread consuming
  100% of CPU.

  
  * The dmesg log will show the following message regarding other packets being 
skipped on timestamp routine due to a packet getting stuck in the timestamping 
"pipeline":

  "bnx2x: [bnx2x_start_xmit:3862(eno4)]The device supports only a single
  outstanding packet to timestamp, this packet will not be timestamped"

  Also, by using ftrace user can notice that function bnx2x_ptp_task()
  is being called a lot, and by enabling bnx2x PTP debugging log
  (ethtool -s <iface> msglvl 16777216) it's possible to observe the
  following message flooding the kernel log:

  "bnx2x: [bnx2x_ptp_task:15242(eno4)]There is no valid Tx timestamp
  yet"

  
  * The  patch proposed in this SRU request is accepted upstream and is 
available currently (2019-07-03) in David Miller's linux-net tree:
  git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/commit/?id=3c91f25c2f72
  Besides fixing the issue, it also adds an ethtool statistics for accounting 
the ptp errors and reduces message flooding in case of errors.


  [Test case]

  Reproducing the problem is not difficult; we've used chrony in Bionic
  to trigger the problem. The steps are:

  a) Install chrony on Bionic in a system with working NIC managed by
  bnx2x;

  b) Edit chrony configuration and add: "hwtimestamp *" to the top of
  its conf file;

  c) Restart chrony service

  Check dmesg for the "[...]single outstanding packet" message and the
  overall CPU workload using a tool like "top" to observe a kthread
  consuming 100% of CPU.

  
  [Regression potential]

  The patch scope is restricted to bnx2x ptp handler, and was validated
  by the driver maintainer. If there's any possibility of regressions,
  we believe the worst would be an issue affecting the packet
  timestamping, not messing with the regular xmit path for the driver.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1832082/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to