On Tue, May 20, 2025 at 03:33:45PM -0700, Sathyanarayanan Kuppuswamy wrote:
> On 5/20/25 2:50 PM, Bjorn Helgaas wrote:
> > From: Jon Pan-Doh <pan...@google.com>
> > 
> > Spammy devices can flood kernel logs with AER errors and slow/stall
> > execution. Add per-device ratelimits for AER correctable and non-fatal
> > uncorrectable errors that use the kernel defaults (10 per 5s).  Logging of
> > fatal errors is not ratelimited.

> > +   /* Ratelimits for errors */
> > +   struct ratelimit_state cor_log_ratelimit;
> > +   struct ratelimit_state uncor_log_ratelimit;
> 
> Nit: Do you think we should name it as nonfatal_log_ratelimit?

Maybe so.  We can always change this internal name, so I guess the
important part is the sysfs filename
("/sys/bus/pci/devices/<dev>/aer/ratelimit_burst_uncor_log").

"ratelimit_burst_nonfatal_log" is not quite parallel with
"ratelimit_burst_cor_log" the way "ratelimit_burst_uncor_log" is.

But it's definitely true that the underlying PCIe Messages are
ERR_COR, ERR_NONFATAL, and ERR_FATAL.

So I think this is more than a nit, and you're right that we should
use "cor" and "nonfatal" somehow.

I'll work on that tomorrow.

Reply via email to