On Wed, Jul 30, 2025 at 10:24:07PM +0200, Lukas Wunner wrote: > On Wed, Jul 30, 2025 at 10:01:50PM +0200, Lukas Wunner wrote: > > On Wed, Jul 30, 2025 at 01:20:57PM +0200, Niklas Schnelle wrote: > > > Since commit 7b42d97e99d3 ("PCI/ERR: Always report current recovery > > > status for udev") AER uses the result of error_detected() as parameter > > > to pci_uevent_ers(). As pci_uevent_ers() however does not handle > > > PCI_ERS_RESULT_NEED_RESET this results in a missing uevent for the > > > beginning of recovery if drivers request a reset. Fix this by treating > > > PCI_ERS_RESULT_NEED_RESET as beginning recovery. > > [...] > > > +++ b/drivers/pci/pci-driver.c > > > @@ -1592,6 +1592,7 @@ void pci_uevent_ers(struct pci_dev *pdev, enum > > > pci_ers_result err_type) > > > switch (err_type) { > > > case PCI_ERS_RESULT_NONE: > > > case PCI_ERS_RESULT_CAN_RECOVER: > > > + case PCI_ERS_RESULT_NEED_RESET: > > > envp[idx++] = "ERROR_EVENT=BEGIN_RECOVERY"; > > > envp[idx++] = "DEVICE_ONLINE=0"; > > > break; > > > > I note that PCI_ERS_RESULT_NO_AER_DRIVER is also missing in that > > switch/case statement. I guess for the patch to be complete, > > it needs to be added to the PCI_ERS_RESULT_DISCONNECT case. > > Do you agree? > > I realize now there's a bigger problem here: In pcie_do_recovery(), > when control reaches the "failed:" label, a uevent is only signaled > for the *bridge*. Shouldn't a uevent instead be signaled for every > device *below* the bridge? (And possibly the bridge itself if it was > the device reporting the error.)
The small patch below should resolve this issue. Please let me know what you think. > In that case you don't need to add PCI_ERS_RESULT_NO_AER_DRIVER to > the switch/case statement because we wouldn't want to have multiple > uevents reporting disconnect, so the one emitted below the "failed:" > label would be sufficient. I'll send a separate Reviewed-by for your original patch as the small patch below should resolve my concern about PCI_ERS_RESULT_NO_AER_DRIVER. > This all looks so broken that I'm starting to wonder if there's any > user space application at all that takes advantage of these uevents? I'd still be interested to know which user space application you're using to track these uevents? Thanks, Lukas -- >8 -- diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c index e795e5ae..3a95aa2 100644 --- a/drivers/pci/pcie/err.c +++ b/drivers/pci/pcie/err.c @@ -165,6 +165,12 @@ static int report_resume(struct pci_dev *dev, void *data) return 0; } +static int report_disconnect(struct pci_dev *dev, void *data) +{ + pci_uevent_ers(dev, PCI_ERS_RESULT_DISCONNECT); + return 0; +} + /** * pci_walk_bridge - walk bridges potentially AER affected * @bridge: bridge which may be a Port, an RCEC, or an RCiEP @@ -272,7 +278,7 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev, failed: pci_walk_bridge(bridge, pci_pm_runtime_put, NULL); - pci_uevent_ers(bridge, PCI_ERS_RESULT_DISCONNECT); + pci_walk_bridge(bridge, report_disconnect, NULL); pci_info(bridge, "device recovery failed\n");