On Wed, 21 May 2025 17:54:30 -0500 Bjorn Helgaas <helg...@kernel.org> wrote:
> On Wed, May 21, 2025 at 11:31:21AM +0100, Jonathan Cameron wrote: > > On Tue, 20 May 2025 16:50:32 -0500 > > Bjorn Helgaas <helg...@kernel.org> wrote: > > > > > From: Jon Pan-Doh <pan...@google.com> > > > > > > Spammy devices can flood kernel logs with AER errors and slow/stall > > > execution. Add per-device ratelimits for AER correctable and non-fatal > > > uncorrectable errors that use the kernel defaults (10 per 5s). Logging of > > > fatal errors is not ratelimited. > > > > See below. I'm not sure that logging of fatal error should affect the rate > > for non fatal errors + the rate limit infrastructure kind of assumes > > that you only call it if you are planning to respect it's decision. > > > > Given overall aim is to restrict rates, maybe we don't care if we sometimes > > throttle earlier that we might expect with a simpler separation of what > > is being limited. > > > > I don't mind strongly either way. > > > > @@ -593,7 +593,8 @@ struct aer_err_info { > > > unsigned int id:16; > > > > > > unsigned int severity:2; /* 0:NONFATAL | 1:FATAL | 2:COR */ > > > - unsigned int __pad1:5; > > > + unsigned int ratelimit:1; /* 0=skip, 1=print */ > > > > That naming is less than intuitive. Maybe expand it to ratelimit_print or > > something like that. > > True, although it does match uses like "if (aer_ratelimit(...))" > > I'll try ratelimit_print and see how you like it :) > > > > + unsigned int __pad1:4; > > > unsigned int multi_error_valid:1; > > > > > > unsigned int first_error:5; > > > diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c > > > index 4f1bff0f000f..f9e684ac7878 100644 > > > --- a/drivers/pci/pcie/aer.c > > > +++ b/drivers/pci/pcie/aer.c > > > > > @@ -815,8 +843,19 @@ EXPORT_SYMBOL_NS_GPL(pci_print_aer, "CXL"); > > > */ > > > static int add_error_device(struct aer_err_info *e_info, struct pci_dev > > > *dev) > > > { > > > + /* > > > + * Ratelimit AER log messages. "dev" is either the source > > > + * identified by the root's Error Source ID or it has an unmasked > > > + * error logged in its own AER Capability. If any of these devices > > > + * has not reached its ratelimit, log messages for all of them. > > > + * Messages are emitted when "e_info->ratelimit" is non-zero. > > > + * > > > + * Note that "e_info->ratelimit" was already initialized to 1 for the > > > + * ERR_FATAL case. > > > + */ > > > if (e_info->error_dev_num < AER_MAX_MULTI_ERR_DEVICES) { > > > e_info->dev[e_info->error_dev_num] = pci_dev_get(dev); > > > + e_info->ratelimit |= aer_ratelimit(dev, e_info->severity); > > > > So this is a little odd. I think it works but there is code inside > > __ratelimit that I think we should not be calling for that > > ERROR_FATAL case (whether we should call lots of times for each > > device isn't obvious either but maybe that is more valid). > > > > In the event of it already being 1 due to ERROR_FATAL you will > > falsely trigger a potential print from inside __ratelimit() if we > > were rate limited and no longer are but only skipped FATAL prints. > > My concern is that function is kind of assuming it's only called in > > cases where a rate limit decision is being made and the > > implementation may change in future). > > Hmmm. That's pretty subtle, thanks for catching this. > > In the light of day, ".ratelimit = fatal ? 1 : 0" looks a bit sketchy. > If we want to avoid ratelimiting AER_FATAL, maybe aer_ratelimit() > should just return 1 ("print") unconditionally in that case, without > calling __ratelimit(): > > static int aer_ratelimit(struct pci_dev *dev, unsigned int severity) > { > struct ratelimit_state *ratelimit; > > if (severity == AER_FATAL) > return 1; /* AER_FATAL not ratelimited */ > > if (severity == AER_CORRECTABLE) > ratelimit = &dev->aer_info->cor_log_ratelimit; > else > ratelimit = &dev->aer_info->uncor_log_ratelimit; > > return __ratelimit(ratelimit); > } Neat solution so go with that. > > That still leaves this question of how to deal with info->dev[] when > there's more than one entry, which is kind of an annoying case that > only happens for the native AER path. > > I think it's because for a single AER interrupt from an RP/RCEC, we > collect the root info in one struct aer_err_info and scrape all the > downstream devices for anything interesting. We visit each downstream > device and is_error_source() reads its status register, but we only > keep the pci_dev pointer, so aer_get_device_error_info() has to read > the status registers *again*. This all seems kind of obtuse. > > The point of the OR above in add_error_device() was to try to match up > RP/RCEC logging with downstream device logging so they're ratelimited > the same. If we ratelimit the Error Source ID based on the RP/RCEC > and the details based on the downstream devices individually, they'll > get out of sync, so sometimes we'll print an Error Source ID and elide > the details and vice versa. > > I wanted to make it so that if we log downstream details, we also log > the Error Source ID. But maybe we should ratelimit downstream devices > individually (instead of doing this weird union) and make the RP/RCEC > part more explicit, e.g., > > add_error_device(...) > { > int i = e_info->error_dev_num; > > e_info->dev[i] = pci_dev_get(dev); > e_info->error_dev_num++; > > if (aer_ratelimit(dev, e_info->severity)) { > e_info->root_ratelimit_print = 1; > e_info->ratelimit_print[i] = 1; > } > } As it's a weird corner case, I don't really mind how you handle it. I'm not sure I grasp this last suggestion fully but can look at the full code if you do go with something like that. Jonathan > > > https://elixir.bootlin.com/linux/v6.14.7/source/lib/ratelimit.c#L56 > > > > Maybe, > > if (!info->ratelimit) > > e_info->ratelimit = aer_ratelimit(dev, > > e_info->severity); > > is an alternative option. > > That allows a multiplication factor on the rate as all device count for 1.