On Tue, May 19, 2026 at 05:05:20PM +0100, Yury M. wrote: > Root port can detect AER error with source 0000:00:00.0. > > In this case, we call find_source_device -> find_device_iter. The > 'multi-error' flag is not set, and we are looking for the first error (not > all). This means that for any error with the 0000:00:00.0 source on the root > port, we will report the error for the first device on the bus.
No, is_error_source() considers bus number 0 as a bogus number and will iterate over all devices on the bus. > In my case, an AER error reported by 0000:06:08.0 will be logged as an error > reported by 0000:06:07.0 if AER recovery constantly fails. The problem is that 0000:06:08.0 reports an Advisory Non-Fatal Error, i.e. it sets the ANFE bit in the Correctable Error Status Register and signals (only) a Correctable Error, even though it also sets bits in the Uncorrectable Error Status Register. The kernel lacks support for ANFE handling and will only clear the bits in the Correctable Error Status Register. It neglects to also clear (and report) the bits in the Uncorrectable Error Status Register. There was an effort two years back to bring up ANFE support but it fizzled out. I talked to the submitter and he's now busy with other things: https://lore.kernel.org/r/[email protected]/ It's on my todo list to respin his series but I can't promise when I'll get to it. Thanks, Lukas
