On Sun, Oct 12, 2025 at 3:30 PM Lukas Wunner <[email protected]> wrote: > > When the PCI core gained power management support in 2002, it introduced > pci_save_state() and pci_restore_state() helpers to restore Config Space > after a D3hot or D3cold transition, which implies a Soft or Fundamental > Reset (PCIe r7.0 sec 5.8): > > https://git.kernel.org/tglx/history/c/a5287abe398b > > In 2006, EEH and AER were introduced to recover from errors by performing > a reset. Because errors can occur at any time, drivers began calling > pci_save_state() on probe to ensure recoverability. > > In 2009, recoverability was foiled by commit c82f63e411f1 ("PCI: check > saved state before restore"): It amended pci_restore_state() to bail out > if the "state_saved" flag has been cleared. The flag is cleared by > pci_restore_state() itself, hence a saved state is now allowed to be > restored only once and is then invalidated. That doesn't seem to make > sense because the saved state should be good enough to be reused. > > Soon after, drivers began to work around this behavior by calling > pci_save_state() immediately after pci_restore_state(), see e.g. commit > b94f2d775a71 ("igb: call pci_save_state after pci_restore_state"). > Hilariously, two drivers even set the "saved_state" flag to true before > invoking pci_restore_state(), see ipr_reset_restore_cfg_space() and > e1000_io_slot_reset(). > > Despite these workarounds, recoverability at all times is not guaranteed: > E.g. when a PCIe port goes through a runtime suspend and resume cycle, > the "saved_state" flag is cleared by: > > pci_pm_runtime_resume() > pci_pm_default_resume_early() > pci_restore_state() > > ... and hence on a subsequent AER event, the port's Config Space cannot be > restored. Riana reports a recovery failure of a GPU-integrated PCIe > switch and has root-caused it to the behavior of pci_restore_state(). > Another workaround would be necessary, namely calling pci_save_state() in > pcie_port_device_runtime_resume(). > > The motivation of commit c82f63e411f1 was to prevent restoring state if > pci_save_state() hasn't been called before. But that can be achieved by > saving state already on device addition, after Config Space has been > initialized. A desirable side effect is that devices become recoverable > even if no driver gets bound. This renders the commit unnecessary, so > revert it. > > Reported-by: Riana Tauro <[email protected]> # off-list > Tested-by: Riana Tauro <[email protected]> > Signed-off-by: Lukas Wunner <[email protected]> > --- > Proof that removing the check in pci_restore_state() makes no > difference for the PCI core: > > * The only relevant invocations of pci_restore_state() are in > drivers/pci/pci-driver.c > * One invocation is in pci_restore_standard_config(), which is > always called conditionally on "if (pci_dev->state_saved)". > So the check at the beginning of pci_restore_state() which > this patch removes is an unnecessary duplication. > * Another invocation is in pci_pm_default_resume_early(), which > is called from pci_pm_resume_noirq(), pci_pm_restore_noirq() > and pci_pm_runtime_resume(). Those functions are only called > after prior calls to pci_pm_suspend_noirq(), pci_pm_freeze_noirq(), > and pci_pm_runtime_suspend(), respectively. And all of them > call pci_save_state(). So the "if (!dev->state_saved)" check > in pci_restore_state() never evaluates to true. > * A third invocation is in pci_pm_thaw_noirq(). It is only called > after a prior call to pci_pm_freeze_noirq(), which invokes > pci_save_state(). So likewise the "if (!dev->state_saved)" check > in pci_restore_state() never evaluates to true. > > drivers/pci/bus.c | 7 +++++++ > drivers/pci/pci.c | 3 --- > drivers/pci/probe.c | 2 -- > 3 files changed, 7 insertions(+), 5 deletions(-) > > diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c > index f26aec6..4318568 100644 > --- a/drivers/pci/bus.c > +++ b/drivers/pci/bus.c > @@ -358,6 +358,13 @@ void pci_bus_add_device(struct pci_dev *dev) > pci_bridge_d3_update(dev); > > /* > + * Save config space for error recoverability. Clear state_saved > + * to detect whether drivers invoked pci_save_state() on suspend. > + */ > + pci_save_state(dev); > + dev->state_saved = false; > + > + /* > * If the PCI device is associated with a pwrctrl device with a > * power supply, create a device link between the PCI device and > * pwrctrl device. This ensures that pwrctrl drivers are probed > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > index b14dd06..2f0da5d 100644 > --- a/drivers/pci/pci.c > +++ b/drivers/pci/pci.c > @@ -1855,9 +1855,6 @@ static void pci_restore_rebar_state(struct pci_dev > *pdev) > */ > void pci_restore_state(struct pci_dev *dev) > { > - if (!dev->state_saved) > - return; > -
So after this change, is there any mechanism to clear state_saved after a suspend-resume cycle so that the next cycle is not confused by seeing it set? > pci_restore_pcie_state(dev); > pci_restore_pasid_state(dev); > pci_restore_pri_state(dev); > diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c > index c83e75a..c7c7a3d 100644 > --- a/drivers/pci/probe.c > +++ b/drivers/pci/probe.c > @@ -2747,8 +2747,6 @@ void pci_device_add(struct pci_dev *dev, struct pci_bus > *bus) > > pci_reassigndev_resource_alignment(dev); > > - dev->state_saved = false; > - > pci_init_capabilities(dev); > > /* > --
