On Mon, Jan 26, 2026 at 10:42:06AM -0800, Kuppuswamy Sathyanarayanan wrote:
> On 1/25/2026 1:25 AM, Lukas Wunner wrote:
> > Correctable and Uncorrectable Error Status Registers on reporting agents
> > are cleared upon PCI device enumeration in pci_aer_init() to flush past
> > events. They're cleared again when an error is handled by the AER driver.
> >
> > If an agent reports a new error after pci_aer_init() and before the AER
> > driver has probed on the corresponding Root Port or Root Complex Event
> > Collector, that error is not handled by the AER driver: It clears the
> > Root Error Status Register on probe, but neglects to re-clear the
> > Correctable and Uncorrectable Error Status Registers on reporting agents.
> >
> > The error will eventually be reported when another error occurs. Which
> > is irritating because to an end user it appears as if the earlier error
> > has just happened.
> >
> > Amend the AER driver to clear stale errors on reporting agents upon probe.
> >
> > Skip reporting agents which have not invoked pci_aer_init() yet to avoid
> > using an uninitialized pdev->aer_cap. They're recognizable by the error
> > bits in the Device Control register still being clear.
> >
> > Reporting agents may execute pci_aer_init() after the AER driver has
> > probed, particularly when devices are hotplugged or removed/rescanned via
> > sysfs. For this reason, it continues to be necessary that pci_aer_init()
> > clears Correctable and Uncorrectable Error Status Registers.
>
> Can you include details about where and in what configuration you observed
> this issue?
The issue was observed on an upcoming Xeon "Diamond Rapids" platform,
where certain Root Complex Integrated Endpoints (the Data Streaming
Accelerator and In-Memory Analytics Accelerator) raise a Correctable Error
of type "Advisory Non-Fatal Error" when certain fields in Config Space are
accessed. The RCiEPs send an ERR_COR Message to their Root Complex Event
Collector, but it is not handled because the AER driver hasn't probed yet.
When it later on does probe, it only clear the error bits of the RCEC, not
those of the RCiEPs.
Since this platform is not yet in customers' hands and the issue
apparently wasn't observed on other platforms before, I refrained
from including those details in the commit message. But I can respin
and include them, or Bjorn may choose to amend the commit message
with those details if/when applying the patch.
> > +static int clear_status_iter(struct pci_dev *dev, void *data)
> > +{
> > + u16 devctl;
> > +
> > + /* Skip if pci_enable_pcie_error_reporting() hasn't been called yet */
> > + pcie_capability_read_word(dev, PCI_EXP_DEVCTL, &devctl);
> > + if (!(devctl & PCI_EXP_AER_FLAGS))
> > + return 0;
> > +
> > + pci_aer_clear_status(dev);
> > + pcie_clear_device_status(dev);
>
> Should pci_aer_init() also clear device status along with uncor/cor
> error status?
Hm, good question. For AER-supporting devices, it probably makes sense
since we're also clearing the bits when handling an error.
It's unclear what to do on non-AER-supporting devices. PCIe r7.0 sec 6.2.1
calls this "baseline capability" error signaling. If a device doesn't
support AER, I don't think we get a (spec-defined) interrupt to report
and clear errors. But the device may still raise an interrupt which
would then be received and handled by its driver in some custom way.
So I guess that on "baseline capability" devices, it is the job of the
device driver to report and clear errors. One could argue that it's
also the driver's job to clear stale bits on probe. Because if the
kernel does that on enumeration, new errors may occur until the driver
probes and so the driver would have to clear stale bits on probe
anyway.
I can look into amending pci_aer_init() to clear the Device Status
error bits on AER-supporting devices, but it's an orthogonal issue
to the one addressed by this patch.
Thanks,
Lukas