On 2021/4/14 16:08, Lijun Pan wrote: > There are chances that napi_disable can be called twice by NIC driver. > This could generate deadlock. For example, > the first napi_disable will spin until NAPI_STATE_SCHED is cleared > by napi_complete_done, then set it again. > When napi_disable is called the second time, it will loop infinitely > because no dev->poll will be running to clear NAPI_STATE_SCHED. > > Though it is driver writer's responsibility to make sure it being > called only once, making napi_disable more robust does not hurt, not > to say it can prevent a buggy driver from crashing a system. > So, we check the napi state bit to make sure that if napi is already > disabled, we exit the call early enough to avoid spinning infinitely. > > Fixes: bea3348eef27 ("[NET]: Make NAPI polling independent of struct > net_device objects.") > Signed-off-by: Lijun Pan <lijunp...@gmail.com> > --- > v2: justify that this patch makes napi_disable more robust. > > net/core/dev.c | 18 ++++++++++++++++++ > 1 file changed, 18 insertions(+) > > diff --git a/net/core/dev.c b/net/core/dev.c > index 1f79b9aa9a3f..fa0aa212b7bb 100644 > --- a/net/core/dev.c > +++ b/net/core/dev.c > @@ -6830,6 +6830,24 @@ EXPORT_SYMBOL(netif_napi_add); > void napi_disable(struct napi_struct *n) > { > might_sleep(); > + > + /* make sure napi_disable() runs only once, > + * When napi is disabled, the state bits are like: > + * NAPI_STATE_SCHED (set by previous napi_disable) > + * NAPI_STATE_NPSVC (set by previous napi_disable) > + * NAPI_STATE_DISABLE (cleared by previous napi_disable) > + * NAPI_STATE_PREFER_BUSY_POLL (cleared by previous napi_complete_done) > + * NAPI_STATE_MISSED (cleared by previous napi_complete_done) > + */ > + > + if (napi_disable_pending(n)) > + return; > + if (test_bit(NAPI_STATE_SCHED, &n->state) && > + test_bit(NAPI_STATE_NPSVC, &n->state) && > + !test_bit(NAPI_STATE_MISSED, &n->state) && > + !test_bit(NAPI_STATE_PREFER_BUSY_POLL, &n->state)) > + return;
The NAPI_STATE_DISABLE is cleared at the end of napi_disable(), and if a buggy driver/hw triggers a interrupt and driver calls napi_schedule_irqoff(), which may set NAPI_STATE_MISSED if NAPI_STATE_SCHED is set(in napi_schedule_prep()), the above checking does not seem to handle it? > + > set_bit(NAPI_STATE_DISABLE, &n->state); > > while (test_and_set_bit(NAPI_STATE_SCHED, &n->state)) >