On Tue, May 20, 2025 at 03:33:45PM -0700, Sathyanarayanan Kuppuswamy wrote: > On 5/20/25 2:50 PM, Bjorn Helgaas wrote: > > From: Jon Pan-Doh <pan...@google.com> > > > > Spammy devices can flood kernel logs with AER errors and slow/stall > > execution. Add per-device ratelimits for AER correctable and non-fatal > > uncorrectable errors that use the kernel defaults (10 per 5s). Logging of > > fatal errors is not ratelimited.
> > + /* Ratelimits for errors */ > > + struct ratelimit_state cor_log_ratelimit; > > + struct ratelimit_state uncor_log_ratelimit; > > Nit: Do you think we should name it as nonfatal_log_ratelimit? Maybe so. We can always change this internal name, so I guess the important part is the sysfs filename ("/sys/bus/pci/devices/<dev>/aer/ratelimit_burst_uncor_log"). "ratelimit_burst_nonfatal_log" is not quite parallel with "ratelimit_burst_cor_log" the way "ratelimit_burst_uncor_log" is. But it's definitely true that the underlying PCIe Messages are ERR_COR, ERR_NONFATAL, and ERR_FATAL. So I think this is more than a nit, and you're right that we should use "cor" and "nonfatal" somehow. I'll work on that tomorrow.