Thank you for your contribution! Sashiko AI review found 4 potential issue(s) 
to consider:

New issues:
- [High] Static CQ moderation is silently disabled if only one parameter is 
configured.
- [High] Unconditional use of `GDMA_MESSAGE_V3` may break queue creation on 
older firmware.
- [High] Data race in DIM updates due to calling net_dim() after 
napi_complete_done().

Pre-existing issues:
- [High] ethtool coalesce change can leave the interface permanently 
detached/down on failure.
--

commit ed18c14aae7553debe324941814e80fe58d99fe0
Author: Haiyang Zhang <[email protected]>

net: mana: Add Interrupt Moderation support

Add Static and Dynamic Interrupt Moderation (DIM) support for Rx and Tx.
Update queue creation procedure with new data struct with the related
settings. Update ethtool handler to get/set the moderation settings.

> diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c 
> b/drivers/net/ethernet/microsoft/mana/mana_en.c
> --- a/drivers/net/ethernet/microsoft/mana/mana_en.c
> +++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
> @@ -1551,6 +1551,9 @@ int mana_create_wq_obj(struct mana_port_context *apc,
>  
>       mana_gd_init_req_hdr(&req.hdr, MANA_CREATE_WQ_OBJ,
>                            sizeof(req), sizeof(resp));
> +
> +     req.hdr.req.msg_version = GDMA_MESSAGE_V3;
> +     req.hdr.resp.msg_version = GDMA_MESSAGE_V2;

[Severity: High]
Could this unconditional upgrade to GDMA_MESSAGE_V3 break queue creation on
older firmware?

If the driver is loaded on an older host where the hardware firmware does
not support V3 messages (for example, lacking DIM capabilities), it seems
the hardware might reject the MANA_CREATE_WQ_OBJ request. Would it be
safer to conditionally apply this message version upgrade based on hardware
capabilities?

[ ... ]

> @@ -2271,7 +2337,13 @@ static int mana_cq_handler(void *context, struct 
> gdma_queue *gdma_queue)
>       if (w < cq->budget) {
>               mana_gd_ring_cq(gdma_queue, SET_ARM_BIT);
>               cq->work_done_since_doorbell = 0;
> -             napi_complete_done(&cq->napi, w);
> +
> +             if (napi_complete_done(&cq->napi, w)) {
> +                     if (cq->type == MANA_CQ_TYPE_RX)
> +                             mana_update_rx_dim(cq);
> +                     else
> +                             mana_update_tx_dim(cq);
> +             }

[Severity: High]
Does calling net_dim() after napi_complete_done() introduce a data race?

Since napi_complete_done() clears the NAPI_STATE_SCHED bit and
relinquishes the NAPI polling context, and the hardware interrupt
was already armed via mana_gd_ring_cq() just above, a new interrupt
could fire immediately.

If another CPU schedules NAPI and executes mana_cq_handler() concurrently,
it seems internal state in net_dim() could be corrupted. Additionally,
could this lead to torn reads of 64-bit queue statistics on 32-bit systems
since they are read outside NAPI protection?

[ ... ]

> @@ -2780,6 +2862,10 @@ static struct mana_rxq *mana_create_rxq(struct 
> mana_port_context *apc,
>       cq_spec.queue_size = cq->gdma_cq->queue_size;
>       cq_spec.modr_ctx_id = 0;
>       cq_spec.attached_eq = cq->gdma_cq->cq.parent->id;
> +     cq_spec.req_cq_moderation = apc->rx_dim_enabled ||
> +             (apc->intr_modr_rx_usec && apc->intr_modr_rx_comp);
> +     cq_spec.cq_moderation_usec = apc->intr_modr_rx_usec;
> +     cq_spec.cq_moderation_comp = apc->intr_modr_rx_comp;

[Severity: High]
Does this logic silently disable static moderation if only one parameter is
configured?

If a user disables adaptive moderation and requests timer-based moderation
without a frame limit (e.g. adaptive-rx off rx-usecs 50 rx-frames 0), the
logical AND evaluates to false. This appears to cause the driver to silently
disable hardware moderation completely instead of applying the requested
rx-usecs.

(Note: The same pattern appears in mana_create_txq() for TX).

> diff --git a/drivers/net/ethernet/microsoft/mana/mana_ethtool.c 
> b/drivers/net/ethernet/microsoft/mana/mana_ethtool.c
> --- a/drivers/net/ethernet/microsoft/mana/mana_ethtool.c
> +++ b/drivers/net/ethernet/microsoft/mana/mana_ethtool.c
[ ... ]
> @@ -447,10 +517,46 @@ static int mana_set_coalesce(struct net_device *ndev,
[ ... ]
> +     if (modr_changed || dim_changed) {
> +             err = mana_detach(ndev, false);
> +             if (err) {
> +                     netdev_err(ndev, "mana_detach failed: %d\n", err);
> +                     goto restore_modr;
> +             }
> +
> +             err = mana_attach(ndev);
> +             if (err) {
> +                     netdev_err(ndev, "mana_attach failed: %d\n", err);
> +                     goto restore_modr;
> +             }
> +     }
> +
>       return err;
>  
> +restore_modr:
> +     apc->cqe_coalescing_enable = saved_cqe_coalescing_enable;

[Severity: High]
This is a pre-existing issue, but does this error path leave the interface
permanently detached?

If mana_detach() succeeds but mana_attach() fails (for example, due to an
allocation failure), the function jumps to restore_modr. This restores the
software tracking variables but does not appear to revert the hardware state.

This could leave the network interface completely broken and detached from
the system until it is manually brought down and up again.

-- 
Sashiko AI review ยท 
https://sashiko.dev/#/patchset/[email protected]?part=1

Reply via email to