Hi Jakub,

On Wed, Nov 26, 2025 at 08:05:41PM -0800, Jakub Kicinski wrote:
> On Sun, 23 Nov 2025 10:08:18 -0800 Dipayaan Roy wrote:
> > Implement .ndo_tx_timeout for MANA so any stalled TX queue can be detected
> > and a device-controlled port reset for all queues can be scheduled to a
> > ordered workqueue. The reset for all queues on stall detection is
> > recomended by hardware team.
> > 
> > The change introduces a single ordered workqueue
> > "mana_per_port_queue_reset_wq" queuing one work_struct per port,
> > using WQ_UNBOUND | WQ_MEM_RECLAIM so stalled queue reset work can
> > run on any CPU and still make forward progress under memory
> > pressure.
> 
> And we need to be able to reset the NIC queue under memory pressure
> because.. ?  I could be wrong but I still find this unusual / defensive
> programming, if you could point me at some existing drivers that'd help.
>
I found these existing drivers using 'create_singlethread_workqueue',

drivers/net/ethernet/mellanox/mlx4/en_main.c
drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
drivers/net/ethernet/mellanox/mlx5/core/en_main.c

'create_singlethread_workqueue' in turn uses  WQ_MEM_RECLAIM

as in below macros 
#define alloc_ordered_workqueue(fmt, flags, args...) \
        alloc_workqueue(fmt, WQ_UNBOUND | __WQ_ORDERED | (flags), 1,
##args)

...
#define create_singlethread_workqueue(name) \
        alloc_ordered_workqueue("%s", __WQ_LEGACY | WQ_MEM_RECLAIM,
name)

I will switch to directly using create_singlethread_workqueue instead
of explicitly mentioning the flags in the next version. 

 
> > @@ -3287,6 +3341,7 @@ static int mana_probe_port(struct mana_context *ac, 
> > int port_idx,
> >     ndev->min_mtu = ETH_MIN_MTU;
> >     ndev->needed_headroom = MANA_HEADROOM;
> >     ndev->dev_port = port_idx;
> > +   ndev->watchdog_timeo = 15 * HZ;
> 
> 5 sec is typical, off the top of my head
> 
As per our internal discussion, 15 second timeout recommended by HW team based 
on the FPGA reconfig
scenario.
> > @@ -3647,6 +3717,11 @@ void mana_remove(struct gdma_dev *gd, bool 
> > suspending)
> >             free_netdev(ndev);
> >     }
> >  
> > +   if (ac->per_port_queue_reset_wq) {
> > +           destroy_workqueue(ac->per_port_queue_reset_wq);
> > +           ac->per_port_queue_reset_wq = NULL;
> > +   }
> 
> I think you're missing this cleanup in the failure path of mana_probe
Right, if all the ports fail to probe the clean up will get skipped from
mana_remove. I will fix this in the v5.
> -- 
> pw-bot: cr

Thank you for the comments, I will work on it in v5.

Regards

Reply via email to