From: AI Reviewer <[email protected]>

This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
Full review at: https://netdev-ai.bots.linux.dev/sashiko/
---
net: mana: Create separate EQs for each vPort

This patch moves EQ allocation from per-device to per-vPort, exports
mana_create_eq()/mana_destroy_eq() for use by the RDMA driver, takes
a vport reference for RSS QPs, and adds cross-port PD-sharing
rejection for raw QPs along with serialization between
mana_set_channels() and the RDMA vport configuration.

> diff --git a/drivers/infiniband/hw/mana/main.c 
> b/drivers/infiniband/hw/mana/main.c
> index ac5e75dd3494..f8a9013f0ca3 100644
> --- a/drivers/infiniband/hw/mana/main.c
> +++ b/drivers/infiniband/hw/mana/main.c
> @@ -40,13 +42,27 @@ int mana_ib_cfg_vport(struct mana_ib_dev *dev, u32 port, 
> struct mana_ib_pd *pd,
>  
>       pd->vport_use_count++;
>       if (pd->vport_use_count > 1) {
> +             /* Reject cross-port PD sharing. EQs and vport config
> +              * are per-port, so the PD must stay bound to the port
> +              * that was configured on the first raw QP creation.
> +              */
> +             if (pd->vport_port != port) {
> +                     pd->vport_use_count--;
> +                     mutex_unlock(&pd->vport_mutex);
> +                     ibdev_dbg(&dev->ib_dev,
> +                               "PD already bound to port %u\n",
> +                               pd->vport_port);
> +                     return -EINVAL;
> +             }
>               ibdev_dbg(&dev->ib_dev,
>                         "Skip as this PD is already configured vport\n");
>               mutex_unlock(&pd->vport_mutex);
>               return 0;
>       }
>  
> -     err = mana_cfg_vport(mpc, pd->pdn, doorbell_id);
> +     pd->vport_port = port;
> +
> +     err = mana_cfg_vport(mpc, pd->pdn, doorbell_id, true);
>       if (err) {
>               pd->vport_use_count--;
>               mutex_unlock(&pd->vport_mutex);

[Low]
In the failure path of mana_ib_cfg_vport(), if mana_cfg_vport() or the
later mana_create_eq() fails, vport_use_count is rolled back to 0 but
pd->vport_port keeps the value just written.

Is the contract that pd->vport_port is only meaningful while
vport_use_count > 0?  That invariant is not documented in the struct
mana_ib_pd comment, and a future reader who consults pd->vport_port
without first checking vport_use_count under vport_mutex would see stale
data.  Would it be worth either resetting pd->vport_port on the rollback
path or documenting the rule explicitly?

> @@ -55,15 +71,23 @@ int mana_ib_cfg_vport(struct mana_ib_dev *dev, u32 port, 
> struct mana_ib_pd *pd,
>               return err;
>       }
>  
> -     mutex_unlock(&pd->vport_mutex);
>  
> -     pd->tx_shortform_allowed = mpc->tx_shortform_allowed;
> -     pd->tx_vp_offset = mpc->tx_vp_offset;
> +     err = mana_create_eq(mpc);
> +     if (err) {
> +             mana_uncfg_vport(mpc);
> +             pd->vport_use_count--;
> +     } else {
> +             pd->tx_shortform_allowed = mpc->tx_shortform_allowed;
> +             pd->tx_vp_offset = mpc->tx_vp_offset;
> +     }

[ ... ]

> diff --git a/drivers/infiniband/hw/mana/qp.c b/drivers/infiniband/hw/mana/qp.c
> index 0fbcf449c134..108ec4c5ce51 100644
> --- a/drivers/infiniband/hw/mana/qp.c
> +++ b/drivers/infiniband/hw/mana/qp.c
> @@ -155,6 +156,18 @@ static int mana_ib_create_qp_rss(struct ib_qp *ibqp, 
> struct ib_pd *pd,
>  
>       qp->port = port;
>  
> +     /* Take a reference on the vport to ensure EQs outlive this QP.
> +      * The vport must already be configured by a raw QP.
> +      */
> +     mutex_lock(&mana_pd->vport_mutex);
> +     if (!mana_pd->vport_use_count) {
> +             mutex_unlock(&mana_pd->vport_mutex);
> +             ret = -EINVAL;
> +             goto fail;
> +     }
> +     mana_pd->vport_use_count++;
> +     mutex_unlock(&mana_pd->vport_mutex);
> +

[Critical]
The cross-port PD-sharing check added to mana_ib_cfg_vport() is not
mirrored here.  mana_ib_create_qp_rss() takes the user-supplied port
from ucmd.port and only checks that vport_use_count is non-zero, so the
RSS path will happily attach to a different port than the one bound to
the PD by the first raw QP.

If a user creates a raw QP on port A (which sets pd->vport_port=A and
allocates mpc_A->eqs via mana_create_eq()), then creates an RSS QP on
the same PD with ucmd.port=B where port B has no Ethernet up and no
RDMA activity, mpc_B->eqs is NULL.  The subsequent code in this same
function:

        eq = &mpc->eqs[cq->comp_vector % mpc->num_queues];
        cq_spec.attached_eq = eq->eq->id;

would then dereference NULL through mpc_B->eqs.

If port B does have Ethernet up, mpc_B->eqs is owned by the Ethernet
driver and the RSS QP attaches to those EQs.  When the QP is destroyed
mana_ib_destroy_qp_rss() calls mana_ib_uncfg_vport(mdev, pd, qp->port=B),
and once pd->vport_use_count reaches 0 mana_ib_uncfg_vport() will run
mana_destroy_eq(mpc_B) on Ethernet's live EQs and call mana_uncfg_vport
on a port whose apc->vport_use_count was never incremented by this PD,
tripping the WARN_ON in mana_uncfg_vport() and corrupting Ethernet's
vport state.  Meanwhile port A's EQs and vport configuration are leaked
because nothing on this PD will destroy them.

Should mana_ib_create_qp_rss() apply the same pd->vport_port == port
check that mana_ib_cfg_vport() now performs, before bumping
vport_use_count?

[Medium]
The new "vport must already be configured by a raw QP" requirement is a
visible uAPI change.  Before this patch, EQs were allocated device-wide
on mana_context at probe time, so an RSS QP could be created without any
prior raw QP on the PD.  After the patch, applications that legitimately
use only RSS QPs (RX-only RDMA workloads) will see -EINVAL here unless
they also create a raw QP first.

Was breaking the RSS-only case considered, and would lazy EQ creation on
the first QP regardless of type be feasible to avoid the
incompatibility?

>       for (i = 0; i < ind_tbl_size; i++) {
>               struct mana_obj_spec wq_spec = {};
>               struct mana_obj_spec cq_spec = {};
> @@ -171,13 +184,13 @@ static int mana_ib_create_qp_rss(struct ib_qp *ibqp, 
> struct ib_pd *pd,
>               cq_spec.gdma_region = cq->queue.gdma_region;
>               cq_spec.queue_size = cq->cqe * COMP_ENTRY_SIZE;
>               cq_spec.modr_ctx_id = 0;
> -             eq = &mpc->ac->eqs[cq->comp_vector];
> +             eq = &mpc->eqs[cq->comp_vector % mpc->num_queues];
>               cq_spec.attached_eq = eq->eq->id;

[ ... ]

> diff --git a/drivers/net/ethernet/microsoft/mana/mana_ethtool.c 
> b/drivers/net/ethernet/microsoft/mana/mana_ethtool.c
> index 04350973e19e..4633acc976f0 100644
> --- a/drivers/net/ethernet/microsoft/mana/mana_ethtool.c
> +++ b/drivers/net/ethernet/microsoft/mana/mana_ethtool.c
> @@ -462,10 +467,22 @@ static int mana_set_channels(struct net_device *ndev,
>       unsigned int new_count = channels->combined_count;
>       unsigned int old_count = apc->num_queues;
>       int err;
>  
> +     /* Set channel_changing to block RDMA from grabbing the vport
> +      * during the detach/attach window. mana_cfg_vport() checks
> +      * this flag under vport_mutex and returns -EBUSY if set.
> +      */
> +     mutex_lock(&apc->vport_mutex);
> +     if (!apc->port_is_up && apc->vport_use_count) {
> +             mutex_unlock(&apc->vport_mutex);
> +             return -EBUSY;
> +     }
> +     apc->channel_changing = true;
> +     mutex_unlock(&apc->vport_mutex);

[Low]
The commit message says:

    When the port is down, apc->vport_mutex is held for the entire
    operation since mana_detach()/mana_attach() do not take vport_mutex
    in that case. When the port is up, Ethernet owns the vport
    exclusively so no additional locking is needed.

Does this description match the code?  mana_set_channels() drops
apc->vport_mutex immediately after setting channel_changing=true and
runs mana_pre_alloc_rxbufs(), mana_detach() and mana_attach() without
the mutex held, then re-acquires the mutex only at the end to clear the
flag.

The actual serialization mechanism is the channel_changing boolean
checked by mana_cfg_vport() under vport_mutex, not a long mutex hold.
Could the commit message be updated to match the implemented design so
that future readers do not assume a different invariant?
-- 
This is an AI-generated review.


Reply via email to