On Mon, May 11, 2026 at 07:21:33PM -0700, Jakub Kicinski wrote:
> This is an AI-generated review of your patch. The human sending this
> email has considered the AI review valid, or at least plausible.
> ---
> net: mana: force full-page RX buffers via ethtool private flag
> 
> This patch adds an ethtool private flag "full-page-rx" that forces one
> RX buffer per page, bypassing the page_pool fragment path, to address
> a 15-20% throughput regression seen on some 4K PAGE_SIZE ARM64
> platforms under high connection counts.  Toggling the flag triggers
> mana_detach() + mana_attach() to apply the new RX buffer layout.
> 
> > diff --git a/drivers/net/ethernet/microsoft/mana/mana_ethtool.c 
> > b/drivers/net/ethernet/microsoft/mana/mana_ethtool.c
> > index a28ca461c135..dfd222de11d0 100644
> > --- a/drivers/net/ethernet/microsoft/mana/mana_ethtool.c
> > +++ b/drivers/net/ethernet/microsoft/mana/mana_ethtool.c
> 
> [ ... ]
> 
> > @@ -590,6 +609,88 @@ static int mana_get_link_ksettings(struct net_device 
> > *ndev,
> >     return 0;
> >  }
> >  
> 
> [ ... ]
> 
> > +static int mana_set_priv_flags(struct net_device *ndev, u32 priv_flags)
> > +{
> > +   struct mana_port_context *apc = netdev_priv(ndev);
> > +   u32 changed = apc->priv_flags ^ priv_flags;
> > +   u32 old_priv_flags = apc->priv_flags;
> > +   bool schedule_port_reset = false;
> > +   int err = 0;
> > +
> > +   if (!changed)
> > +           return 0;
> > +
> > +   /* Reject unknown bits */
> > +   if (priv_flags & ~GENMASK(MANA_PRIV_FLAG_MAX - 1, 0))
> > +           return -EINVAL;
> > +
> > +   if (changed & BIT(MANA_PRIV_FLAG_USE_FULL_PAGE_RXBUF)) {
> > +           apc->priv_flags = priv_flags;
> > +
> > +           if (!apc->port_is_up) {
> > +                   /* Port is down, flag updated to apply on next up
> > +                    * so just return.
> > +                    */
> > +                   return 0;
> > +           }
> > +
> > +           /* Pre-allocate buffers to prevent failure in mana_attach
> > +            * later
> > +            */
> > +           err = mana_pre_alloc_rxbufs(apc, ndev->mtu, apc->num_queues);
> > +           if (err) {
> > +                   netdev_err(ndev,
> > +                              "Insufficient memory for new allocations\n");
> > +                   apc->priv_flags = old_priv_flags;
> > +                   return err;
> > +           }
> > +
> > +           err = mana_detach(ndev, false);
> > +           if (err) {
> > +                   netdev_err(ndev, "mana_detach failed: %d\n", err);
> > +                   apc->priv_flags = old_priv_flags;
> > +
> > +                   /* Port is in an inconsistent state. Restore
> > +                    * 'port_is_up' so that queue reset work handler
> > +                    * can properly detach and re-attach.
> > +                    */
> > +                   apc->port_is_up = true;
> > +                   schedule_port_reset = true;
> > +                   goto out;
> > +           }
> > +
> > +           err = mana_attach(ndev);
> > +           if (err) {
> > +                   netdev_err(ndev, "mana_attach failed: %d\n", err);
> > +                   apc->priv_flags = old_priv_flags;
> > +
> > +                   /* Restore 'port_is_up' so the reset work handler
> > +                    * can properly detach/attach. Without this,
> > +                    * the handler sees port_is_up=false and skips
> > +                    * queue allocation, leaving the port dead.
> > +                    */
> > +                   apc->port_is_up = true;
> > +                   schedule_port_reset = true;
> > +           }
> > +   }
> > +
> > +out:
> > +   mana_pre_dealloc_rxbufs(apc);
> > +
> > +   if (schedule_port_reset)
> > +           queue_work(apc->ac->per_port_queue_reset_wq,
> > +                      &apc->queue_reset_work);
> > +
> > +   return err;
> > +}
> 
> [High]
> Can this failure path oops inside the reset work handler?
> 
> When mana_attach() fails after mana_detach() has already torn the vport
> down, apc->tx_qp has been set to NULL by mana_destroy_txq() and apc->rxqs
> has been cleared by mana_cleanup_port_context().  This code then forces
> apc->port_is_up back to true and queues apc->queue_reset_work:
> 
>     apc->port_is_up = true;
>     schedule_port_reset = true;
> 
> When mana_per_port_queue_reset_work_handler runs it calls
> mana_detach(ndev, false), which latches port_st_save = port_is_up = true
> and, because port_st_save is true, invokes mana_dealloc_queues().  The TX
> drain loop there does:
> 
>     txq = &apc->tx_qp[i].txq;
>     atomic_read(&txq->pending_sends);
> 
> with no NULL check on apc->tx_qp, and mana_destroy_vport further down
> iterates apc->rxqs[rxq_idx] while rxqs is NULL.  Would that produce a
> NULL pointer dereference in the workqueue worker whenever mana_attach()
> fails here (for example on an allocation failure in
> mana_init_port_context or mana_create_txq, or a transient HW command
> failure in mana_query_vport_cfg / mana_cfg_vport)?
> 
> The mana_detach() failure path above (goto out) reaches
> mana_dealloc_queues() through the same chain and looks to have the same
> exposure.
Hi Jakub,

Thanks for your comments, I have sent a separate fixes series to the
net tree that fixes the possible NULL pointer derefernce issue and also
makes the reset handler safe for the case where it runs after a failed
attach:
https://lore.kernel.org/all/[email protected]/


> 
> For comparison, mana_change_mtu() handles a mana_attach() failure by
> returning the error without scheduling a reset.  Would a similar
> treatment here avoid the asynchronous oops, or is there a reason the
> reset must be scheduled in this specific failure case?
> -- 
> pw-bot: cr

The full-page-rx private flag is intended to be driven by a udev rule
that fires automatically during VM provisioning on affected platforms.
If there is a transient failure, the VM fails to provision, requiring manual
intervention.The reset handler retries the attach, giving the port a
chance to recover to default config autonomously without intervention.

Regards
Dipayaan Roy

Reply via email to