On Tue, 10 Feb 2026 01:11:41 -0800 Breno Leitao <[email protected]> wrote:
> Hello Andrew,
>
> On Mon, Feb 02, 2026 at 06:27:38AM -0800, Breno Leitao wrote:
> > The kernel already tracks recoverable hardware errors (CPU, memory, PCI,
> > CXL, etc.) in the hwerr_data array for vmcoreinfo crash dump analysis.
> > However, this data is only accessible after a crash.
> >
> > This series adds a sysfs directory at /sys/kernel/hwerr_recovery_stats/ to
> > expose these statistics at runtime, allowing monitoring tools to track
> > hardware health without requiring a kernel crash.
> >
> > The directory contains one file per error subsystem:
> > /sys/kernel/hwerr_recovery_stats/{cpu, memory, pci, cxl, others}
> >
> > Each file contains a single integer representing the error count.
> >
> > This is useful for:
> > - Proactive detection of failing hardware components
> > - Time-series tracking of recoverable errors
> > - System health monitoring in cloud environments
>
> Is there a chance this could be included in the 6.20 merge window?
During the 7.0 merge window? Sure. I'll be taking a look at this (and
a whole lot more) after 7.0-rc1 is released.