On Thu, Nov 13, 2025 at 9:49 PM Rafael J. Wysocki <[email protected]> wrote: > > On Sun, Oct 12, 2025 at 3:30 PM Lukas Wunner <[email protected]> wrote: > > > > When the PCI core gained power management support in 2002, it introduced > > pci_save_state() and pci_restore_state() helpers to restore Config Space > > after a D3hot or D3cold transition, which implies a Soft or Fundamental > > Reset (PCIe r7.0 sec 5.8): > > > > https://git.kernel.org/tglx/history/c/a5287abe398b > > > > In 2006, EEH and AER were introduced to recover from errors by performing > > a reset. Because errors can occur at any time, drivers began calling > > pci_save_state() on probe to ensure recoverability. > > > > In 2009, recoverability was foiled by commit c82f63e411f1 ("PCI: check > > saved state before restore"): It amended pci_restore_state() to bail out > > if the "state_saved" flag has been cleared. The flag is cleared by > > pci_restore_state() itself, hence a saved state is now allowed to be > > restored only once and is then invalidated. That doesn't seem to make > > sense because the saved state should be good enough to be reused. > > > > Soon after, drivers began to work around this behavior by calling > > pci_save_state() immediately after pci_restore_state(), see e.g. commit > > b94f2d775a71 ("igb: call pci_save_state after pci_restore_state"). > > Hilariously, two drivers even set the "saved_state" flag to true before > > invoking pci_restore_state(), see ipr_reset_restore_cfg_space() and > > e1000_io_slot_reset(). > > > > Despite these workarounds, recoverability at all times is not guaranteed: > > E.g. when a PCIe port goes through a runtime suspend and resume cycle, > > the "saved_state" flag is cleared by: > > > > pci_pm_runtime_resume() > > pci_pm_default_resume_early() > > pci_restore_state() > > > > ... and hence on a subsequent AER event, the port's Config Space cannot be > > restored. Riana reports a recovery failure of a GPU-integrated PCIe > > switch and has root-caused it to the behavior of pci_restore_state(). > > Another workaround would be necessary, namely calling pci_save_state() in > > pcie_port_device_runtime_resume(). > > > > The motivation of commit c82f63e411f1 was to prevent restoring state if > > pci_save_state() hasn't been called before. But that can be achieved by > > saving state already on device addition, after Config Space has been > > initialized. A desirable side effect is that devices become recoverable > > even if no driver gets bound. This renders the commit unnecessary, so > > revert it. > > > > Reported-by: Riana Tauro <[email protected]> # off-list > > Tested-by: Riana Tauro <[email protected]> > > Signed-off-by: Lukas Wunner <[email protected]> > > --- > > Proof that removing the check in pci_restore_state() makes no > > difference for the PCI core: > > > > * The only relevant invocations of pci_restore_state() are in > > drivers/pci/pci-driver.c > > * One invocation is in pci_restore_standard_config(), which is > > always called conditionally on "if (pci_dev->state_saved)". > > So the check at the beginning of pci_restore_state() which > > this patch removes is an unnecessary duplication. > > * Another invocation is in pci_pm_default_resume_early(), which > > is called from pci_pm_resume_noirq(), pci_pm_restore_noirq() > > and pci_pm_runtime_resume(). Those functions are only called > > after prior calls to pci_pm_suspend_noirq(), pci_pm_freeze_noirq(), > > and pci_pm_runtime_suspend(), respectively. And all of them > > call pci_save_state(). So the "if (!dev->state_saved)" check > > in pci_restore_state() never evaluates to true. > > * A third invocation is in pci_pm_thaw_noirq(). It is only called > > after a prior call to pci_pm_freeze_noirq(), which invokes > > pci_save_state(). So likewise the "if (!dev->state_saved)" check > > in pci_restore_state() never evaluates to true. > > > > drivers/pci/bus.c | 7 +++++++ > > drivers/pci/pci.c | 3 --- > > drivers/pci/probe.c | 2 -- > > 3 files changed, 7 insertions(+), 5 deletions(-) > > > > diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c > > index f26aec6..4318568 100644 > > --- a/drivers/pci/bus.c > > +++ b/drivers/pci/bus.c > > @@ -358,6 +358,13 @@ void pci_bus_add_device(struct pci_dev *dev) > > pci_bridge_d3_update(dev); > > > > /* > > + * Save config space for error recoverability. Clear state_saved > > + * to detect whether drivers invoked pci_save_state() on suspend. > > + */ > > + pci_save_state(dev); > > + dev->state_saved = false; > > + > > + /* > > * If the PCI device is associated with a pwrctrl device with a > > * power supply, create a device link between the PCI device and > > * pwrctrl device. This ensures that pwrctrl drivers are probed > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > > index b14dd06..2f0da5d 100644 > > --- a/drivers/pci/pci.c > > +++ b/drivers/pci/pci.c > > @@ -1855,9 +1855,6 @@ static void pci_restore_rebar_state(struct pci_dev > > *pdev) > > */ > > void pci_restore_state(struct pci_dev *dev) > > { > > - if (!dev->state_saved) > > - return; > > - > > So after this change, is there any mechanism to clear state_saved > after a suspend-resume cycle so that the next cycle is not confused by > seeing it set?
Never mind, this hasn't changed. So I agree with Bjorn that it would be good to expand the new comment in pci_bus_add_device() because it doesn't really explain much in its current form. Or maybe even split it to say "Save config space for error recoverability" before pci_save_state(dev) and then explain why state_saved is cleared after it? Apart from this and modulo possible changelog adjustments Reviewed-by: Rafael J. Wysocki (Intel) <[email protected]>
