After a Fatal Error has been reported by a device and has been recovered through a Secondary Bus Reset, AER updates the device's error_state to pci_channel_io_normal before invoking its driver's ->resume() callback.
By contrast, EEH updates the error_state earlier, namely after resetting the device and before invoking its driver's ->slot_reset() callback. Commit c58dc575f3c8 ("powerpc/pseries: Set error_state to pci_channel_io_normal in eeh_report_reset()") explains in great detail that the earlier invocation is necessitated by various drivers checking accessibility of the device with pci_channel_offline() and avoiding accesses if it returns true. It returns true for any other error_state than pci_channel_io_normal. The device should be accessible already after reset, hence the reasoning is that it's safe to update the error_state immediately afterwards. This deviation between AER and EEH seems problematic because drivers behave differently depending on which error recovery mechanism the platform uses. Three drivers have gone so far as to update the error_state themselves, presumably to work around AER's behavior. For consistency, amend AER to update the error_state at the same recovery steps as EEH. Drop the now unnecessary workaround from the three drivers. Keep updating the error_state before ->resume() in case ->error_detected() or ->mmio_enabled() return PCI_ERS_RESULT_RECOVERED, which causes ->slot_reset() to be skipped. There are drivers doing this even for Fatal Errors, e.g. mhi_pci_error_detected(). Signed-off-by: Lukas Wunner <lu...@wunner.de> --- drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.c | 1 - drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c | 2 -- drivers/pci/pcie/err.c | 3 ++- drivers/scsi/qla2xxx/qla_os.c | 5 ----- 4 files changed, 2 insertions(+), 9 deletions(-) diff --git a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.c b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.c index d7cdea8f604d..91e7b38143ea 100644 --- a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.c +++ b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.c @@ -4215,7 +4215,6 @@ static pci_ers_result_t qlcnic_83xx_io_slot_reset(struct pci_dev *pdev) struct qlcnic_adapter *adapter = pci_get_drvdata(pdev); int err = 0; - pdev->error_state = pci_channel_io_normal; err = pci_enable_device(pdev); if (err) goto disconnect; diff --git a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c index 53cdd36c4123..e051d8c7a28d 100644 --- a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c +++ b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c @@ -3766,8 +3766,6 @@ static int qlcnic_attach_func(struct pci_dev *pdev) struct qlcnic_adapter *adapter = pci_get_drvdata(pdev); struct net_device *netdev = adapter->netdev; - pdev->error_state = pci_channel_io_normal; - err = pci_enable_device(pdev); if (err) return err; diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c index 930bb60fb761..bebe4bc111d7 100644 --- a/drivers/pci/pcie/err.c +++ b/drivers/pci/pcie/err.c @@ -153,7 +153,8 @@ static int report_slot_reset(struct pci_dev *dev, void *data) device_lock(&dev->dev); pdrv = dev->driver; - if (!pdrv || !pdrv->err_handler || !pdrv->err_handler->slot_reset) + if (!pci_dev_set_io_state(dev, pci_channel_io_normal) || + !pdrv || !pdrv->err_handler || !pdrv->err_handler->slot_reset) goto out; err_handler = pdrv->err_handler; diff --git a/drivers/scsi/qla2xxx/qla_os.c b/drivers/scsi/qla2xxx/qla_os.c index d4b484c0fd9d..4460421834cb 100644 --- a/drivers/scsi/qla2xxx/qla_os.c +++ b/drivers/scsi/qla2xxx/qla_os.c @@ -7883,11 +7883,6 @@ qla2xxx_pci_slot_reset(struct pci_dev *pdev) "Slot Reset.\n"); ha->pci_error_state = QLA_PCI_SLOT_RESET; - /* Workaround: qla2xxx driver which access hardware earlier - * needs error state to be pci_channel_io_online. - * Otherwise mailbox command timesout. - */ - pdev->error_state = pci_channel_io_normal; pci_restore_state(pdev); -- 2.47.2