On 8/4/25 8:35 AM, Breno Leitao wrote:
Hello Sathyanarayanan,

On Mon, Aug 04, 2025 at 06:50:30AM -0700, Sathyanarayanan Kuppuswamy wrote:
On 8/4/25 2:17 AM, Breno Leitao wrote:
Similarly to pci_dev_aer_stats_incr(), pci_print_aer() may be called
when dev->aer_info is NULL. Add a NULL check before proceeding to avoid
calling aer_ratelimit() with a NULL aer_info pointer, returning 1, which
does not rate limit, given this is fatal.
Why not add it to pci_print_aer() ?

This prevents a kernel crash triggered by dereferencing a NULL pointer
in aer_ratelimit(), ensuring safer handling of PCI devices that lack
AER info. This change aligns pci_print_aer() with pci_dev_aer_stats_incr()
which already performs this NULL check.
Is this happening during the kernel boot ? What is the frequency and steps
to reproduce? I am curious about why pci_print_aer() is called for a PCI device
without aer_info. Not aer_info means, that particular device is already released
or in the process of release (pci_release_dev()). Is this triggered by using a 
stale
pci_dev pointer?
I've reported some of these investigations in here:

https://lore.kernel.org/all/buduna6darbvwfg3aogl5kimyxkggu3n4romnmq6sozut6axeu@clnx7sfsy457/

It has some details. But you did not mention details like your environment, 
steps to
reproduce and how often you see it. I just want to understand in what scenario
pci_print_aer() is triggered, when releasing the device. I am wondering whether 
we
are missing proper locking some where.


--
Sathyanarayanan Kuppuswamy
Linux Kernel Developer


Reply via email to