Public bug reported:
[Impact]
* The mpt3sas driver relies in a FW queue (Reply Post Descriptor Queue)
in the I/O completion path; there's a MMIO register that driver uses to
flag an empty entry in such queue, called Reply Post Host Index. This
value is updated during the driver interrupt routine [in
_base_interrupt() function].
* Happens that there are 2 registers representing the Reply Post Host
Index according to the type of the adapter. They are differentiated in
the driver through the "ioc->combined_reply_queue" check. By the MPI
specification (vendor spec), driver should use this combined reply queue
according to the number of maximum MSI-X vectors that the adapter
exposes and the spec version (SAS 3.0 vs SAS 3.5).
* Currently, this is wrong checked for a class of adapters, which was fixed in
the upstream
kernel commit 2b48be65685a [0]. Without this commit, we can observe spontaneous
resets in the
driver due to queue overflow (FW is not aware that there are free entries in
the Reply Post Descriptor Queue). The dmesg log will show the following output
in case of this error:
mpt3sas_cm0: fault_state(0x2100)!
mpt3sas_cm0: sending diag reset !!
mpt3sas_cm0: diag reset: SUCCESS
[followed by a lot of driver messages as result of the reset procedure]
* During these resets, I/O is stalled so it may affect performance.
[Test Case]
* It's not trivial to test the problem, but given a machine with an
affected device, an I/O benchmark like FIO could be used to exercise the
I/O path in a heavy way and trigger the issue. We have reports that the
adapter "LSI Logic / Symbios Logic Device [1000:00ac]" is affected by
the issue.
[Regression Potential]
* This is a long-term issue from the mpt3sas driver, affecting only a
class of adapters of this vendor. Since it's a clear bug, the fix is
necessary. The potential of regressions is unknown, but likely low - it
changes the register used for the index updates given some set of
characteristics of the adapter (according to the spec.), which restricts
even more the scope of this patch.
** Affects: linux (Ubuntu)
Importance: Critical
Assignee: Guilherme G. Piccoli (gpiccoli)
Status: Confirmed
** Affects: linux (Ubuntu Xenial)
Importance: Undecided
Status: New
** Affects: linux (Ubuntu Bionic)
Importance: Undecided
Status: New
** Affects: linux (Ubuntu Cosmic)
Importance: Undecided
Status: New
** Affects: linux (Ubuntu Disco)
Importance: Critical
Assignee: Guilherme G. Piccoli (gpiccoli)
Status: Confirmed
** Tags: sts
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1810781
Title:
mpt3sas - driver using the wrong register to update a queue index in
FW
Status in linux package in Ubuntu:
Confirmed
Status in linux source package in Xenial:
New
Status in linux source package in Bionic:
New
Status in linux source package in Cosmic:
New
Status in linux source package in Disco:
Confirmed
Bug description:
[Impact]
* The mpt3sas driver relies in a FW queue (Reply Post Descriptor
Queue) in the I/O completion path; there's a MMIO register that driver
uses to flag an empty entry in such queue, called Reply Post Host
Index. This value is updated during the driver interrupt routine [in
_base_interrupt() function].
* Happens that there are 2 registers representing the Reply Post Host
Index according to the type of the adapter. They are differentiated in
the driver through the "ioc->combined_reply_queue" check. By the MPI
specification (vendor spec), driver should use this combined reply
queue according to the number of maximum MSI-X vectors that the
adapter exposes and the spec version (SAS 3.0 vs SAS 3.5).
* Currently, this is wrong checked for a class of adapters, which was fixed
in the upstream
kernel commit 2b48be65685a [0]. Without this commit, we can observe
spontaneous resets in the
driver due to queue overflow (FW is not aware that there are free entries in
the Reply Post Descriptor Queue). The dmesg log will show the following output
in case of this error:
mpt3sas_cm0: fault_state(0x2100)!
mpt3sas_cm0: sending diag reset !!
mpt3sas_cm0: diag reset: SUCCESS
[followed by a lot of driver messages as result of the reset procedure]
* During these resets, I/O is stalled so it may affect performance.
[Test Case]
* It's not trivial to test the problem, but given a machine with an
affected device, an I/O benchmark like FIO could be used to exercise
the I/O path in a heavy way and trigger the issue. We have reports
that the adapter "LSI Logic / Symbios Logic Device [1000:00ac]" is
affected by the issue.
[Regression Potential]
* This is a long-term issue from the mpt3sas driver, affecting only a
class of adapters of this vendor. Since it's a clear bug, the fix is
necessary. The potential of regressions is unknown, but likely low -
it changes the register used for the index updates given some set of
characteristics of the adapter (according to the spec.), which
restricts even more the scope of this patch.
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1810781/+subscriptions
--
Mailing list: https://launchpad.net/~kernel-packages
Post to : [email protected]
Unsubscribe : https://launchpad.net/~kernel-packages
More help : https://help.launchpad.net/ListHelp