On 8/4/25 2:17 AM, Breno Leitao wrote:
Similarly to pci_dev_aer_stats_incr(), pci_print_aer() may be called
when dev->aer_info is NULL. Add a NULL check before proceeding to avoid
calling aer_ratelimit() with a NULL aer_info pointer, returning 1, which
does not rate limit, given this is fatal.
Why not add it to pci_print_aer() ?
This prevents a kernel crash triggered by dereferencing a NULL pointer
in aer_ratelimit(), ensuring safer handling of PCI devices that lack
AER info. This change aligns pci_print_aer() with pci_dev_aer_stats_incr()
which already performs this NULL check.
Is this happening during the kernel boot ? What is the frequency and steps
to reproduce? I am curious about why pci_print_aer() is called for a PCI device
without aer_info. Not aer_info means, that particular device is already released
or in the process of release (pci_release_dev()). Is this triggered by using a
stale
pci_dev pointer?
Signed-off-by: Breno Leitao <lei...@debian.org>
Fixes: a57f2bfb4a5863 ("PCI/AER: Ratelimit correctable and non-fatal error
logging")
---
drivers/pci/pcie/aer.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index 70ac661883672..b5f96fde4dcda 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -786,6 +786,9 @@ static void pci_rootport_aer_stats_incr(struct pci_dev
*pdev,
static int aer_ratelimit(struct pci_dev *dev, unsigned int severity)
{
+ if (!dev->aer_info)
+ return 1;
+
switch (severity) {
case AER_NONFATAL:
return __ratelimit(&dev->aer_info->nonfatal_ratelimit);
---
base-commit: 89748acdf226fd1a8775ff6fa2703f8412b286c8
change-id: 20250801-aer_crash_2-b21cc2ef0d00
Best regards,
--
Breno Leitao <lei...@debian.org>
--
Sathyanarayanan Kuppuswamy
Linux Kernel Developer