On 8/6/2025 4:45 PM, Breno Leitao wrote:
Hello Ethan,
On Wed, Aug 06, 2025 at 09:55:05AM +0800, Ethan Zhao wrote:
On 8/4/2025 5:17 PM, Breno Leitao wrote:
Similarly to pci_dev_aer_stats_incr(), pci_print_aer() may be called
when dev->aer_info is NULL. Add a NULL check before proceeding to avoid
calling aer_ratelimit() with a NULL aer_info pointer, returning 1, which
does not rate limit, given this is fatal.
This prevents a kernel crash triggered by dereferencing a NULL pointer
in aer_ratelimit(), ensuring safer handling of PCI devices that lack
AER info. This change aligns pci_print_aer() with pci_dev_aer_stats_incr()
which already performs this NULL check.
The enqueue side has lock to protect the ring, but the dequeue side no lock
held.
The kfifo_get in
static void aer_recover_work_func(struct work_struct *work)
{
...
while (kfifo_get(&aer_recover_ring, &entry)) {
...
}
should be replaced by
kfifo_out_spinlocked()
The design seems not to need the lock on the reader side. There is just
one reader, which is the aer_recover_work. aer_recover_work runs
aer_recover_work_func(). So, if we just have one reader, we do not need
to protect the kfifo by spinlock, right?
Not exactly,
If the writer and reader are serialized, no lock is needed. However,
here the writer kfifo_in_spinlocked() and the system work queue task
aer_recover_work() cannot guarantee serialized execution.
@Bjorn, help to check it out.
Thanks,
Ethan>
In fact, the code documents it in the aer_recover_ring_lock.
/*
* Mutual exclusion for writers of aer_recover_ring, reader side don't
* need lock, because there is only one reader and lock is not needed
* between reader and writer.
*/
static DEFINE_SPINLOCK(aer_recover_ring_lock);