On Mon, 2025-09-15 at 15:50 +0200, Lukas Wunner wrote: > Amend the documentation on PCI error recovery with specifics about > Downstream Port Containment and Advanced Error Reporting: > > * Explain that with DPC, devices are inaccessible upon an error (similar > to EEH on powerpc) and do not become accessible until the link is > re-enabled. > > * Explain that with AER, although devices may already be accessible in the > ->error_detected() callback, accesses should be deferred to the > ->mmio_enabled() callback for compatibility with EEH on powerpc and with > s390. > > Signed-off-by: Lukas Wunner <lu...@wunner.de> > Reviewed-by: Brian Norris <briannor...@chromium.org> > --- > Documentation/PCI/pci-error-recovery.rst | 22 ++++++++++++++++++++++ > 1 file changed, 22 insertions(+) > > diff --git a/Documentation/PCI/pci-error-recovery.rst > b/Documentation/PCI/pci-error-recovery.rst > index d5c661baa87f..9e1e2f2a13fa 100644 > --- a/Documentation/PCI/pci-error-recovery.rst > +++ b/Documentation/PCI/pci-error-recovery.rst > @@ -122,6 +122,10 @@ A PCI bus error is detected by the PCI hardware. On > powerpc, the slot > is isolated, in that all I/O is blocked: all reads return 0xffffffff, > all writes are ignored. > > +Similarly, on platforms supporting Downstream Port Containment > +(PCIe r7.0 sec 6.2.11), the link to the sub-hierarchy with the > +faulting device is disabled. Any device in the sub-hierarchy > +becomes inaccessible. > > STEP 1: Notification > -------------------- > @@ -204,6 +208,24 @@ link reset was performed by the HW. If the platform > can't just re-enable IOs > without a slot reset or a link reset, it will not call this callback, and > instead will have gone directly to STEP 3 (Link Reset) or STEP 4 (Slot Reset) > > +.. note:: > + > + On platforms supporting Advanced Error Reporting (PCIe r7.0 sec 6.2), > + the faulting device may already be accessible in STEP 1 (Notification). > + Drivers should nevertheless defer accesses to STEP 2 (MMIO Enabled) > + to be compatible with EEH on powerpc and with s390 (where devices are > + inaccessible until STEP 2). > + > + On platforms supporting Downstream Port Containment, the link to the > + sub-hierarchy with the faulting device is re-enabled in STEP 3 (Link > + Reset). Hence devices in the sub-hierarchy are inaccessible until > + STEP 4 (Slot Reset). > + > + For errors such as Surprise Down (PCIe r7.0 sec 6.2.7), the device > + may not even be accessible in STEP 4 (Slot Reset). Drivers can detect > + accessibility by checking whether reads from the device return all 1's > + (PCI_POSSIBLE_ERROR()). > + > .. note:: > > The following is proposed; no platform implements this yet:
Thanks for improving this. Makes sense to mention and spell this out explicitly. Reviewed-by: Niklas Schnelle <schne...@linux.ibm.com>