On Tue, Feb 10, 2026 at 08:34:02PM +0000, Harshank Matkar wrote: > From: Harshank Matkar <[email protected]> > > When ASPM L0s transitions occur on Intel I225/I226 controllers, > transient PCIe link instability can cause register read failures > (0xFFFFFFFF responses).
At the PCIe level, the failure is some uncorrectable PCIe error like a Completion Timeout or Unsupported Request. The 0xFFFFFFFF response is implementation-specific behavior determined by the Root Complex design. > Implement a multi-layer recovery strategy: > 1. Immediate retries: 3 attempts with 100-200μs delays > 2. Link retraining: Trigger PCIe link retraining via capabilities > 3. Device detachment: Only as last resort after max attempts > > The recovery mechanism includes rate limiting, maximum attempt > tracking, and device presence validation to prevent false detaches > on transient ASPM glitches while maintaining safety through > bounded retry limits. I assume the glitch is a hardware erratum and should be documented as such by Intel, although it's possible ASPM L0s isn't configured correctly. If it's a hardware erratum, I think you should use a quirk to disable L0s on these devices, e.g., pci_disable_link_state(pdev, PCIE_LINK_STATE_L0S). Even if this patch allows recovery, the PCIe errors will be logged and reported via AER, which will be confusing to users. Bjorn
