Public bug reported:
[Impact]
The APEI (ACPI Platform Error Interface) interface is supposed to report PCIe
errors to the AER (Advanced Error Reporting) driver, which surfaces them to
userspace. However, we're currently only reporting "recoverable" errors and not
errors of other types (e.g. correctable), thus hiding signs of faulty hardware
from the user.
[Test Case]
$ sudo apt install rasdaemon
# On a system that supports ACPI EINJ (dmesg | grep "ACPI: EINJ"), use the
attached script to inject a correctable PCIe error.
$ sudo ras-mc-ctl --errors
# There should be an entry for the injected error, as shown below:
No Memory errors.
PCIe AER events:
1 2018-05-07 17:55:46 +0000 Fatal error: Receiver Error
No Extlog errors.
No MCE errors.
[Regression Risk]
** Affects: linux (Ubuntu)
Importance: Undecided
Assignee: dann frazier (dannf)
Status: In Progress
** Affects: linux (Ubuntu Bionic)
Importance: Undecided
Assignee: dann frazier (dannf)
Status: In Progress
** Changed in: linux (Ubuntu)
Status: New => In Progress
** Also affects: linux (Ubuntu Bionic)
Importance: Undecided
Status: New
** Changed in: linux (Ubuntu Bionic)
Status: New => In Progress
** Changed in: linux (Ubuntu Bionic)
Assignee: (unassigned) => dann frazier (dannf)
** Changed in: linux (Ubuntu)
Assignee: (unassigned) => dann frazier (dannf)
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1769730
Title:
Some PCIe errors not surfaced through rasdaemon
Status in linux package in Ubuntu:
In Progress
Status in linux source package in Bionic:
In Progress
Bug description:
[Impact]
The APEI (ACPI Platform Error Interface) interface is supposed to report PCIe
errors to the AER (Advanced Error Reporting) driver, which surfaces them to
userspace. However, we're currently only reporting "recoverable" errors and not
errors of other types (e.g. correctable), thus hiding signs of faulty hardware
from the user.
[Test Case]
$ sudo apt install rasdaemon
# On a system that supports ACPI EINJ (dmesg | grep "ACPI: EINJ"), use the
attached script to inject a correctable PCIe error.
$ sudo ras-mc-ctl --errors
# There should be an entry for the injected error, as shown below:
No Memory errors.
PCIe AER events:
1 2018-05-07 17:55:46 +0000 Fatal error: Receiver Error
No Extlog errors.
No MCE errors.
[Regression Risk]
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1769730/+subscriptions
--
Mailing list: https://launchpad.net/~kernel-packages
Post to : [email protected]
Unsubscribe : https://launchpad.net/~kernel-packages
More help : https://help.launchpad.net/ListHelp