Correctable and Uncorrectable Error Status Registers on reporting agents
are cleared upon PCI device enumeration in pci_aer_init() to flush past
events.  They're cleared again when an error is handled by the AER driver.

If an agent reports a new error after pci_aer_init() and before the AER
driver has probed on the corresponding Root Port or Root Complex Event
Collector, that error is not handled by the AER driver:  It clears the
Root Error Status Register on probe, but neglects to re-clear the
Correctable and Uncorrectable Error Status Registers on reporting agents.

The error will eventually be reported when another error occurs.  Which
is irritating because to an end user it appears as if the earlier error
has just happened.

Amend the AER driver to clear stale errors on reporting agents upon probe.

Skip reporting agents which have not invoked pci_aer_init() yet to avoid
using an uninitialized pdev->aer_cap.  They're recognizable by the error
bits in the Device Control register still being clear.

Reporting agents may execute pci_aer_init() after the AER driver has
probed, particularly when devices are hotplugged or removed/rescanned via
sysfs.  For this reason, it continues to be necessary that pci_aer_init()
clears Correctable and Uncorrectable Error Status Registers.

Reported-by: Lucas Van <[email protected]> # off-list
Tested-by: Lucas Van <[email protected]>
Signed-off-by: Lukas Wunner <[email protected]>
---
 drivers/pci/pcie/aer.c | 26 +++++++++++++++++++++++++-
 1 file changed, 25 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index e0bcaa8..4299c55 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -1608,6 +1608,20 @@ static void aer_disable_irq(struct pci_dev *pdev)
        pci_write_config_dword(pdev, aer + PCI_ERR_ROOT_COMMAND, reg32);
 }
 
+static int clear_status_iter(struct pci_dev *dev, void *data)
+{
+       u16 devctl;
+
+       /* Skip if pci_enable_pcie_error_reporting() hasn't been called yet */
+       pcie_capability_read_word(dev, PCI_EXP_DEVCTL, &devctl);
+       if (!(devctl & PCI_EXP_AER_FLAGS))
+               return 0;
+
+       pci_aer_clear_status(dev);
+       pcie_clear_device_status(dev);
+       return 0;
+}
+
 /**
  * aer_enable_rootport - enable Root Port's interrupts when receiving messages
  * @rpc: pointer to a Root Port data structure
@@ -1629,9 +1643,19 @@ static void aer_enable_rootport(struct aer_rpc *rpc)
        pcie_capability_clear_word(pdev, PCI_EXP_RTCTL,
                                   SYSTEM_ERROR_INTR_ON_MESG_MASK);
 
-       /* Clear error status */
+       /* Clear error status of this Root Port or RCEC */
        pci_read_config_dword(pdev, aer + PCI_ERR_ROOT_STATUS, &reg32);
        pci_write_config_dword(pdev, aer + PCI_ERR_ROOT_STATUS, reg32);
+
+       /* Clear error status of agents reporting to this Root Port or RCEC */
+       if (reg32 & AER_ERR_STATUS_MASK) {
+               if (pci_pcie_type(pdev) == PCI_EXP_TYPE_RC_EC)
+                       pcie_walk_rcec(pdev, clear_status_iter, NULL);
+               else if (pdev->subordinate)
+                       pci_walk_bus(pdev->subordinate, clear_status_iter,
+                                    NULL);
+       }
+
        pci_read_config_dword(pdev, aer + PCI_ERR_COR_STATUS, &reg32);
        pci_write_config_dword(pdev, aer + PCI_ERR_COR_STATUS, reg32);
        pci_read_config_dword(pdev, aer + PCI_ERR_UNCOR_STATUS, &reg32);
-- 
2.51.0


Reply via email to