** Changed in: linux (Ubuntu Bionic)
       Status: Confirmed => Fix Committed

** Changed in: linux (Ubuntu Disco)
       Status: Confirmed => Fix Committed

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1855409

Title:
  qede driver causes 100% CPU load

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Xenial:
  Invalid
Status in linux source package in Bionic:
  Fix Committed
Status in linux source package in Disco:
  Fix Committed
Status in linux source package in Eoan:
  Fix Released
Status in linux source package in Focal:
  Fix Released

Bug description:
  [Impact]

  * The PTP feature in qede driver is implemented in a way that if the
  NIC firmware takes some time to perform the timestamping then the PTP
  worker function will reschedule itself indefinitely until the value
  read from a device register is meaningful. With that behavior, if an
  userspace tool requests a bad configured TX/RX filter (or if NIC
  firmware has any other issue in timestamping), the function
  qede_ptp_task() will reschedule itself forever and cause an unbound
  resource consumption. This manifests as a kworker thread consuming
  100% of CPU.

  * The dmesg log will show a message like this:
  "qede_ptp_tx_ts:533(eno3)]Timestamping in progress"

  Also, by using perf user can observe a stack like the following:
  - 44.76% 0.00% kworker/16:5 [kernel.kallsyms]
       ret_from_fork
     - kthread
        - 44.74% worker_thread
           - 44.57% process_one_work
              - 42.67% qede_ptp_task
                 - 38.86% qed_ptp_hw_read_tx_ts
                      qed_rd
                 - 3.03% queue_work_on
                    - 2.06% __queue_work
                       - 0.68% get_work_pool
                          - 0.61% radix_tree_lookup
                               __radix_tree_lookup
                0.50% set_work_pool_and_clear_pending

  * The patch proposed in this SRU request refactors the PTP worked in
  qede by adding a time limit, after which the task doesn't reschedule
  itself anymore, failing the timestamp procedure: 9adebac37e7d ("qede:
  Handle infinite driver spinning for Tx timestamp.")
  http://git.kernel.org/linus/9adebac37e7d

  Besides fixing the issue, it also adds an ethtool statistics for
  accounting the PTP errors.

  [Test case]

  By using chrony in Bionic, the following steps will reproduce the
  issue:

  a) Install chrony on Bionic in a system with working NIC managed by qede;
  b) Edit chrony configuration and add: "hwtimestamp *" to the top of its conf 
file;
  c) Restart chrony service

  Check dmesg for the "[...]Timestamping in progress" message and the
  overall CPU workload using a tool like "top" to observe a kthread
  consuming 100% of CPU.

  [Regression potential]

  The patch scope is restricted to qede PTP handler, and is upstream for
  more than 7 months. If there's any possibility of regressions, the
  worst would be an issue affecting the packet timestamping, not messing
  with the regular xmit path of the driver.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1855409/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to