On 5/25/26 10:08 AM, Dipayaan Roy wrote:
> When mana_per_port_queue_reset_work_handler() runs after a previous
> detach succeeded but attach failed, the port is left in a detached
> state with apc->tx_qp and apc->rxqs already freed. Calling
> mana_detach() again unconditionally leads to NULL pointer dereferences
> during queue teardown.
> 
> Add an early exit in mana_detach() when the port is already in
> detached state (!netif_device_present) for non-close callers, making
> it safe to call idempotently. This allows the queue reset handler and
> other recovery paths to simply retry mana_attach() without redundant
> teardown.
> 
> Fixes: 3b194343c250 ("net: mana: Implement ndo_tx_timeout and serialize queue 
> resets per port.")
> Reviewed-by: Haiyang Zhang <[email protected]>
> Signed-off-by: Dipayaan Roy <[email protected]>
> ---
>  drivers/net/ethernet/microsoft/mana/mana_en.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c 
> b/drivers/net/ethernet/microsoft/mana/mana_en.c
> index 0582803907a8..1e1ad2795c3c 100644
> --- a/drivers/net/ethernet/microsoft/mana/mana_en.c
> +++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
> @@ -3350,6 +3350,12 @@ int mana_detach(struct net_device *ndev, bool 
> from_close)
>  
>       ASSERT_RTNL();
>  
> +     /* If already detached (indicates detach succeeded but attach failed
> +      * previously). Now skip mana detach and just retry mana_attach.
> +      */
> +     if (!from_close && !netif_device_present(ndev))
> +             return 0;
> +
>       apc->port_st_save = apc->port_is_up;
>       apc->port_is_up = false;

sashiko(gemini) notes the above can lead to different race:

---
Can this early return cause state machine corruption by bypassing the
updates
to apc->port_st_save?
Consider this sequence:
1. queue_reset_work runs, mana_detach() succeeds (apc->port_st_save = true,
   apc->port_is_up = false), but mana_attach() fails.
2. The admin brings the interface down (ip link set dev eth0 down), skipping
   mana_close() since apc->port_is_up is false.
3. The admin changes the MTU, triggering mana_change_mtu() which calls
   mana_detach() followed by mana_attach().
4. mana_detach() hits this new early return, preserving
   apc->port_st_save == true.
When mana_attach() runs, it sees apc->port_st_save == true and allocates
queues, setting apc->vport_use_count = 1 and apc->port_is_up = true, even
though the interface is administratively down.
If the admin then brings the interface up, mana_open() will unconditionally
call mana_alloc_queues(). That function calls mana_cfg_vport(), which will
return -EBUSY because apc->vport_use_count is already 1.
This leaves mana_open() failing and the interface down. Since the interface
is already down, trying to bring it down again is a no-op, meaning
mana_close() is never called to clean up the orphaned queues.
Does this sequence permanently brick the port until the driver is reloaded?
---

I think you need to be more restrictive in the early return check.

/P


Reply via email to