This bug is awaiting verification that the kernel in -proposed solves
the problem. Please test the kernel and update this bug with the
results. If the problem is solved, change the tag 'verification-needed-
cosmic' to 'verification-done-cosmic'. If the problem still exists,
change the tag 'verification-needed-cosmic' to 'verification-failed-
cosmic'.

If verification is not done by 5 working days from today, this fix will
be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how
to enable and use -proposed. Thank you!


** Tags added: verification-needed-cosmic

** Tags added: verification-needed-bionic

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1810781

Title:
  mpt3sas - driver using the wrong register to update a queue index in
  FW

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Xenial:
  Won't Fix
Status in linux source package in Bionic:
  Fix Committed
Status in linux source package in Cosmic:
  Fix Committed
Status in linux source package in Disco:
  Fix Released

Bug description:
  [Impact]

  * Adapter resets periodically during high-load activity.

  * I/O stalls until reset/reinit is complete (latency) and I/O performance
  degrades across cluster (e.g., low throughput from data spread over nodes).

  * The mpt3sas driver relies in a FW queue (Reply Post Descriptor
  Queue) in the I/O completion path; there's a MMIO register that driver
  uses to flag an empty entry in such queue, called Reply Post Host
  Index. This value is updated during the driver interrupt routine [in
  _base_interrupt() function].

  * Happens that there are 2 registers representing the Reply Post Host
  Index according to the type of the adapter. They are differentiated in
  the driver through the "ioc->combined_reply_queue" check. By the MPI
  specification (vendor spec), driver should use this combined reply
  queue according to the number of maximum MSI-X vectors that the
  adapter exposes and the spec version (SAS 3.0 vs SAS 3.5).

  * Currently, this is wrong checked for a class of adapters, which was fixed 
in the upstream
  kernel commit 2b48be65685a [0]. Without this commit, we can observe 
spontaneous resets in the
  driver due to queue overflow (FW is not aware that there are free entries in 
the Reply Post Descriptor Queue). The dmesg log will show the following output 
in case of this error:

    mpt3sas_cm0: fault_state(0x2100)!
    mpt3sas_cm0: sending diag reset !!
    mpt3sas_cm0: diag reset: SUCCESS
  [followed by a lot of driver messages as result of the reset procedure]

  * During these resets, I/O is stalled so it may affect performance.

  [Test Case]

  * It's not trivial to test the problem, but given a machine with an
  affected device, an I/O benchmark like FIO could be used to exercise
  the I/O path in a heavy way and trigger the issue.

  * We have reports that the adapter "LSI Logic / Symbios Logic Device
  [1000:00ac]" is affected by the issue.  And this commit resolved the
  problem.

  [Regression Potential]

  * This is a long-term issue from the mpt3sas driver, affecting only a
  class of adapters of this vendor. Since it's a clearly bug, the fix is
  necessary. The potential of regressions is unknown, but likely low -
  it changes the register used for the index updates given some set of
  characteristics of the adapter (according to the spec.), which
  restricts even more the scope of this patch.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1810781/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to