> -----Original Message-----
> From: Alex Dvoretsky <[email protected]>
> Sent: Wednesday, March 11, 2026 9:45 PM
> To: [email protected]
> Cc: [email protected]; Fijalkowski, Maciej
> <[email protected]>; Loktionov, Aleksandr
> <[email protected]>; Nguyen, Anthony L
> <[email protected]>; Kitszel, Przemyslaw
> <[email protected]>; [email protected];
> [email protected]; Alex Dvoretsky <[email protected]>
> Subject: [PATCH net v2] igb: remove napi_synchronize() in igb_down()
> 
> When an AF_XDP zero-copy application terminates abruptly (e.g., kill -
> 9), the XSK buffer pool is destroyed but NAPI polling continues.
> igb_clean_rx_irq_zc() repeatedly returns the full budget, preventing
> napi_complete_done() from clearing NAPI_STATE_SCHED.
> 
> igb_down() calls napi_synchronize() before napi_disable() for each
> queue vector. napi_synchronize() spins waiting for NAPI_STATE_SCHED to
> clear, which never happens. igb_down() blocks indefinitely, the TX
> watchdog fires, and the TX queue remains permanently stalled.
> 
> napi_disable() already handles this correctly: it sets
> NAPI_STATE_DISABLE.
> After a full-budget poll, __napi_poll() checks napi_disable_pending().
> If set, it forces completion and clears NAPI_STATE_SCHED, breaking the
> loop that napi_synchronize() cannot.
> 
> napi_synchronize() was added in commit 41f149a285da ("igb: Fix
> possible panic caused by Rx traffic arrival while interface is down").
> napi_disable() provides stronger guarantees: it prevents further
> scheduling and waits for any active poll to exit.
> Other Intel drivers (ixgbe, ice, i40e) use napi_disable() without a
> preceding napi_synchronize() in their down paths.
> 
> Remove redundant napi_synchronize() call.
> 
> Fixes: 2c6196013f84 ("igb: Add AF_XDP zero-copy Rx support")
> Cc: [email protected]
> Signed-off-by: Alex Dvoretsky <[email protected]>
> ---
> Thanks for the suggestion, Maciej. I tested removing
> napi_synchronize() and it fixes the issue cleanly — napi_disable()
> handles the stuck poll via NAPI_STATE_DISABLE without needing any hot-
> path changes.
> 
> v2:
>   - Replaced 3-patch series with single napi_synchronize() removal,
>     per Maciej Fijalkowski's suggestion. napi_disable() handles the
>     stuck NAPI poll via NAPI_STATE_DISABLE, making the __IGB_DOWN
>     checks in igb_clean_rx_irq_zc() and igb_tx_timeout(), and the
>     transition guards in igb_xdp_setup(), all unnecessary.
>   - Tested on Intel I210 (igb) with AF_XDP zero-copy: full E2E
>     traffic suite, graceful shutdown, and 5x kill-9 stress cycles.
>     Zero tx_timeout events.
> 
>  drivers/net/ethernet/intel/igb/igb_main.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/intel/igb/igb_main.c
> b/drivers/net/ethernet/intel/igb/igb_main.c
> index 12e8e30d8a2d..a1b3c5e4f7d2 100644
> --- a/drivers/net/ethernet/intel/igb/igb_main.c
> +++ b/drivers/net/ethernet/intel/igb/igb_main.c
> @@ -2203,7 +2203,6 @@ void igb_down(struct igb_adapter *adapter)
> 
>       for (i = 0; i < adapter->num_q_vectors; i++) {
>               if (adapter->q_vector[i]) {
> -                     napi_synchronize(&adapter->q_vector[i]->napi);
>                       igb_set_queue_napi(adapter, i, NULL);
>                       napi_disable(&adapter->q_vector[i]->napi);
Ok. But I’d swap the two remaining calls so we don’t modify any per‑queue NAPI 
plumbing while the poll could still be running.
What do you think?

-       igb_set_queue_napi(adapter, i, NULL);
-       napi_disable(&adapter->q_vector[i]->napi);
+       napi_disable(&adapter->q_vector[i]->napi);
+       igb_set_queue_napi(adapter, i, NULL);

Reviewed-by: Aleksandr Loktionov <[email protected]>

>               }
> --
> 2.51.0

Reply via email to