> -----Original Message----- > From: Alex Dvoretsky <[email protected]> > Sent: Wednesday, March 11, 2026 9:45 PM > To: [email protected] > Cc: [email protected]; Fijalkowski, Maciej > <[email protected]>; Loktionov, Aleksandr > <[email protected]>; Nguyen, Anthony L > <[email protected]>; Kitszel, Przemyslaw > <[email protected]>; [email protected]; > [email protected]; Alex Dvoretsky <[email protected]> > Subject: [PATCH net v2] igb: remove napi_synchronize() in igb_down() > > When an AF_XDP zero-copy application terminates abruptly (e.g., kill - > 9), the XSK buffer pool is destroyed but NAPI polling continues. > igb_clean_rx_irq_zc() repeatedly returns the full budget, preventing > napi_complete_done() from clearing NAPI_STATE_SCHED. > > igb_down() calls napi_synchronize() before napi_disable() for each > queue vector. napi_synchronize() spins waiting for NAPI_STATE_SCHED to > clear, which never happens. igb_down() blocks indefinitely, the TX > watchdog fires, and the TX queue remains permanently stalled. > > napi_disable() already handles this correctly: it sets > NAPI_STATE_DISABLE. > After a full-budget poll, __napi_poll() checks napi_disable_pending(). > If set, it forces completion and clears NAPI_STATE_SCHED, breaking the > loop that napi_synchronize() cannot. > > napi_synchronize() was added in commit 41f149a285da ("igb: Fix > possible panic caused by Rx traffic arrival while interface is down"). > napi_disable() provides stronger guarantees: it prevents further > scheduling and waits for any active poll to exit. > Other Intel drivers (ixgbe, ice, i40e) use napi_disable() without a > preceding napi_synchronize() in their down paths. > > Remove redundant napi_synchronize() call. > > Fixes: 2c6196013f84 ("igb: Add AF_XDP zero-copy Rx support") > Cc: [email protected] > Signed-off-by: Alex Dvoretsky <[email protected]> > --- > Thanks for the suggestion, Maciej. I tested removing > napi_synchronize() and it fixes the issue cleanly — napi_disable() > handles the stuck poll via NAPI_STATE_DISABLE without needing any hot- > path changes. > > v2: > - Replaced 3-patch series with single napi_synchronize() removal, > per Maciej Fijalkowski's suggestion. napi_disable() handles the > stuck NAPI poll via NAPI_STATE_DISABLE, making the __IGB_DOWN > checks in igb_clean_rx_irq_zc() and igb_tx_timeout(), and the > transition guards in igb_xdp_setup(), all unnecessary. > - Tested on Intel I210 (igb) with AF_XDP zero-copy: full E2E > traffic suite, graceful shutdown, and 5x kill-9 stress cycles. > Zero tx_timeout events. > > drivers/net/ethernet/intel/igb/igb_main.c | 1 - > 1 file changed, 1 deletion(-) > > diff --git a/drivers/net/ethernet/intel/igb/igb_main.c > b/drivers/net/ethernet/intel/igb/igb_main.c > index 12e8e30d8a2d..a1b3c5e4f7d2 100644 > --- a/drivers/net/ethernet/intel/igb/igb_main.c > +++ b/drivers/net/ethernet/intel/igb/igb_main.c > @@ -2203,7 +2203,6 @@ void igb_down(struct igb_adapter *adapter) > > for (i = 0; i < adapter->num_q_vectors; i++) { > if (adapter->q_vector[i]) { > - napi_synchronize(&adapter->q_vector[i]->napi); > igb_set_queue_napi(adapter, i, NULL); > napi_disable(&adapter->q_vector[i]->napi); Ok. But I’d swap the two remaining calls so we don’t modify any per‑queue NAPI plumbing while the poll could still be running. What do you think?
- igb_set_queue_napi(adapter, i, NULL); - napi_disable(&adapter->q_vector[i]->napi); + napi_disable(&adapter->q_vector[i]->napi); + igb_set_queue_napi(adapter, i, NULL); Reviewed-by: Aleksandr Loktionov <[email protected]> > } > -- > 2.51.0
