On 01/29/26 at 05:34am, Breno Leitao wrote: > Add a sysfs file at /sys/kernel/vmcore_stats and expose hardware error > recovery statistics that are already tracked by the kernel. This allows > userspace monitoring tools to track recovered hardware errors without > requiring kernel crashes.
I don't understand. If w/o requring kernel crashes, why do you call it vmcore_stats? It's a normal showing of hardware error recovery statistics tracked by kernel, can we name it /sys/kernel/hwerr_stats? It's obviously having nothiing to do with vmcore, isn't it? > > This is useful to track recoverable hardware errors in a time series, > even if the host doesn't crash. > > Create a generic vmcore_stats sysfs, and add a section for > hwerr_recovery that shows the counts per subsystem and timestamps: > > - cpu: CPU-related errors (MCE, ARM processor errors) > - memory: Memory-related errors > - pci: PCI/PCIe AER non-fatal errors > - cxl: CXL errors > - other: Other hardware errors > > Example output: > hwerr_recovery: > cpu: 0 (0) > memory: 2 (1738148257) > pci: 1 (1738147000) > cxl: 0 (0) > other: 0 (0) > > The value in parentheses is the timestamp (seconds since epoch) of the > last error of that type, or 0 if no errors have occurred. > > These statistics provide visibility into the health of the system's > hardware and can be used by system administrators to proactively detect > failing components before they cause system crashes. > > Signed-off-by: Breno Leitao <[email protected]> > --- > To: [email protected] > Cc: [email protected] > To: [email protected] > Cc: [email protected] > Cc: [email protected] > Cc: [email protected] > Cc: [email protected] > Cc: [email protected] > Cc: [email protected] > Cc: [email protected] > --- > .../ABI/testing/sysfs-kernel-vmcore_stats | 23 ++++++++++++++++ > kernel/vmcore_info.c | 31 > ++++++++++++++++++++++ > 2 files changed, 54 insertions(+) > > diff --git a/Documentation/ABI/testing/sysfs-kernel-vmcore_stats > b/Documentation/ABI/testing/sysfs-kernel-vmcore_stats > new file mode 100644 > index 0000000000000..b42f18d24c00b > --- /dev/null > +++ b/Documentation/ABI/testing/sysfs-kernel-vmcore_stats > @@ -0,0 +1,23 @@ > +What: /sys/kernel/vmcore_stats > +Date: January 2026 > +KernelVersion: 6.20 > +Contact: Breno Leitao <[email protected]> > +Description: > + Shows statistics related to vmcore functionality. Currently > + includes hardware error recovery statistics. > + > + Format: > + Recovered hardware errors: > + metric: count (timestamp) > + > + Statistics about recoverable hardware errors that the kernel > + has handled since boot. Each metric shows the count and > + timestamp (seconds since epoch) of the last error in > + parentheses (0 if no errors have occurred). > + > + Metrics: > + - cpu: CPU-related errors (MCE, ARM processor errors) > + - memory: Memory-related errors > + - pci: PCI/PCIe AER non-fatal errors > + - cxl: CXL (Compute Express Link) errors > + - other: Other hardware errors > diff --git a/kernel/vmcore_info.c b/kernel/vmcore_info.c > index fe9bf8db1922e..5974b4be08cbc 100644 > --- a/kernel/vmcore_info.c > +++ b/kernel/vmcore_info.c > @@ -6,6 +6,8 @@ > > #include <linux/buildid.h> > #include <linux/init.h> > +#include <linux/kobject.h> > +#include <linux/sysfs.h> > #include <linux/utsname.h> > #include <linux/vmalloc.h> > #include <linux/sizes.h> > @@ -135,6 +137,31 @@ void hwerr_log_error_type(enum hwerr_error_type src) > } > EXPORT_SYMBOL_GPL(hwerr_log_error_type); > > +/* sysfs interface for hardware error recovery statistics */ > +static ssize_t vmcore_stats_show(struct kobject *kobj, > + struct kobj_attribute *attr, char *buf) > +{ > + return sysfs_emit(buf, > + "Recovered hardware errors:\n" > + " cpu: %d (%lld)\n" > + " memory: %d (%lld)\n" > + " pci: %d (%lld)\n" > + " cxl: %d (%lld)\n" > + " other: %d (%lld)\n", > + atomic_read(&hwerr_data[HWERR_RECOV_CPU].count), > + (long > long)READ_ONCE(hwerr_data[HWERR_RECOV_CPU].timestamp), > + atomic_read(&hwerr_data[HWERR_RECOV_MEMORY].count), > + (long > long)READ_ONCE(hwerr_data[HWERR_RECOV_MEMORY].timestamp), > + atomic_read(&hwerr_data[HWERR_RECOV_PCI].count), > + (long > long)READ_ONCE(hwerr_data[HWERR_RECOV_PCI].timestamp), > + atomic_read(&hwerr_data[HWERR_RECOV_CXL].count), > + (long > long)READ_ONCE(hwerr_data[HWERR_RECOV_CXL].timestamp), > + atomic_read(&hwerr_data[HWERR_RECOV_OTHERS].count), > + (long > long)READ_ONCE(hwerr_data[HWERR_RECOV_OTHERS].timestamp)); > +} > + > +static struct kobj_attribute vmcore_stats_attr = __ATTR_RO(vmcore_stats); > + > static int __init crash_save_vmcoreinfo_init(void) > { > vmcoreinfo_data = (unsigned char *)get_zeroed_page(GFP_KERNEL); > @@ -244,6 +271,10 @@ static int __init crash_save_vmcoreinfo_init(void) > arch_crash_save_vmcoreinfo(); > update_vmcoreinfo_note(); > > + /* Create /sys/kernel/vmcore_stats */ > + if (sysfs_create_file(kernel_kobj, &vmcore_stats_attr.attr)) > + pr_warn("Failed to create vmcore_stats sysfs file\n"); > + > return 0; > } > > > --- > base-commit: 8dfce8991b95d8625d0a1d2896e42f93b9d7f68d > change-id: 20260129-vmcoreinfo_sysfs-ff4687979cd5 > > Best regards, > -- > Breno Leitao <[email protected]> >
