[+cc Lukas]

On Mon, May 18, 2026 at 02:23:36PM +0100, Yury Murashka wrote:
> pci_aer_clear_nonfatal_status() is not called when AER recovery fails.
> If a new AER error is subsequently reported, the AER driver calls
> find_source_device() to find the source of the error. It rescans the
> whole bus and picks the first device reporting an AER error. Because the
> previous error was never cleared, the error is attributed to the wrong
> device and AER recovery is started for the wrong device.
> 
> Add a kernel boot parameter pci=aer_clear_on_recovery_failure to clear
> AER error status even when recovery fails, preventing stale errors from
> causing incorrect device identification on subsequent AER events.

Why should we add a kernel parameter for this?  How would a user
decide whether to use the parameter?  Are there cases where we
find the source of the first error, but we *wouldn't* want to clear
it if recovery fails?

Reply via email to