Thank you for your contribution! Sashiko AI review found 5 potential issue(s) 
to consider:
- [High] Calling `mana_update_rx_dim` and `mana_update_tx_dim` after 
`napi_complete_done` violates NAPI serialization, leading to data races and 
corruption of the `net_dim` state.
- [High] Unconditional use of the `GDMA_MESSAGE_V3` format and 
`req_cq_moderation` flag breaks WQ object creation on older MANA firmwares.
- [High] NAPI is enabled before the DIM work struct is initialized, allowing 
NAPI to run on an uninitialized work struct.
- [High] Lockless access to the DIM enable flags lacks memory barriers, 
potentially exposing uninitialized DIM structures to NAPI.
- [High] TX DIM incorrectly uses enqueued packets instead of completed packets, 
and reads concurrent stats unsafely.
--

commit 275fa81625ff24a83b95ff846fea9070fb3193d8
Author: Haiyang Zhang <[email protected]>
Date:   Thu Jun 11 19:02:22 2026 +0000

    net: mana: Add Interrupt Moderation support
    
    This commit adds Static and Dynamic Interrupt Moderation (DIM) support for
    Rx and Tx. It updates the queue creation procedure, adds functions to
    collect stats for DIM, and provides ethtool handler support to get/set
    the moderation settings.

> diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c 
> b/drivers/net/ethernet/microsoft/mana/mana_en.c
> index 26aef21c6c2c8..d92e022d8533c 100644
> --- a/drivers/net/ethernet/microsoft/mana/mana_en.c
> +++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
> @@ -1579,6 +1579,9 @@ int mana_create_wq_obj(struct mana_port_context *apc,
>  
>       mana_gd_init_req_hdr(&req.hdr, MANA_CREATE_WQ_OBJ,
>                            sizeof(req), sizeof(resp));
> +
> +     req.hdr.req.msg_version = GDMA_MESSAGE_V3;
> +     req.hdr.resp.msg_version = GDMA_MESSAGE_V2;
>       req.vport = vport;
>       req.wq_type = wq_type;
>       req.wq_gdma_region = wq_spec->gdma_region;
> @@ -1587,6 +1590,9 @@ int mana_create_wq_obj(struct mana_port_context *apc,
>       req.cq_size = cq_spec->queue_size;
>       req.cq_moderation_ctx_id = cq_spec->modr_ctx_id;
>       req.cq_parent_qid = cq_spec->attached_eq;
> +     req.req_cq_moderation = cq_spec->req_cq_moderation;
> +     req.cq_moderation_comp = cq_spec->cq_moderation_comp;
> +     req.cq_moderation_usec = cq_spec->cq_moderation_usec;

[Severity: High]
Does this break queue creation on older Azure host firmwares?

The code unconditionally bumps the WQ object creation request to use
GDMA_MESSAGE_V3 and passes the req_cq_moderation flag. If the firmware does
not support GDMA_PF_CAP_FLAG_1_DYN_INTERRUPT_MODERATION, could the firmware
reject the queue creation request and break device initialization entirely?

[ ... ]

> +static void mana_update_tx_dim(struct mana_cq *cq)
> +{
> +     struct mana_port_context *apc = netdev_priv(cq->txq->ndev);
> +     struct dim_sample dim_sample = {};
> +     struct mana_txq *txq = cq->txq;
> +
> +     if (!apc->tx_dim_enabled)
> +             return;
> +
> +     dim_update_sample(READ_ONCE(cq->dim_event_ctr), txq->stats.packets,
> +                       txq->stats.bytes, &dim_sample);
> +     net_dim(&cq->dim, &dim_sample);
> +}

[Severity: High]
Does this measure the CPU enqueue rate rather than the hardware completion
rate?

It appears that txq->stats.packets is incremented in mana_start_xmit()
upon enqueue, not during completion in mana_poll_tx_cq(). Dynamic interrupt
moderation relies on measuring the actual hardware completion rate to
function correctly.

Additionally, is it safe to read txq->stats.packets here without the
u64_stats_fetch_begin() retry loop? Reading a 64-bit statistic locklessly
without the synchronization primitives could result in torn reads on 32-bit
systems.

[ ... ]

> @@ -2324,7 +2434,13 @@ static int mana_cq_handler(void *context, struct 
> gdma_queue *gdma_queue)
>       if (w < cq->budget) {
>               mana_gd_ring_cq(gdma_queue, SET_ARM_BIT);
>               cq->work_done_since_doorbell = 0;
> -             napi_complete_done(&cq->napi, w);
> +
> +             if (napi_complete_done(&cq->napi, w)) {
> +                     if (cq->type == MANA_CQ_TYPE_RX)
> +                             mana_update_rx_dim(cq);
> +                     else
> +                             mana_update_tx_dim(cq);
> +             }

[Severity: High]
Is it safe to call mana_update_rx_dim() and mana_update_tx_dim() after
napi_complete_done()?

Since napi_complete_done() ends the NAPI polling cycle and relinquishes
mutual exclusion, and the hardware interrupt was already re-armed by
mana_gd_ring_cq() right before this, couldn't a new packet trigger an
interrupt immediately? This could allow NAPI to be scheduled on another CPU
concurrently while this CPU is still running the DIM update, potentially
corrupting the non-thread-safe net_dim() state machine.

[ ... ]

> @@ -2562,6 +2685,9 @@ static int mana_create_txq(struct mana_port_context 
> *apc,
>               napi_enable_locked(&cq->napi);
>               txq->napi_initialized = true;
>  
> +             INIT_WORK(&cq->dim.work, mana_tx_dim_work);
> +             cq->dim.mode = DIM_CQ_PERIOD_MODE_START_FROM_EQE;

[Severity: High]
Could this allow NAPI to schedule an uninitialized work_struct?

napi_enable_locked() is called right before INIT_WORK(). Once NAPI is
enabled, it can be scheduled to run immediately (for example, via netpoll).
If NAPI executes before INIT_WORK completes, mana_update_tx_dim() would
pass the uninitialized cq->dim to net_dim(), which could cause a kernel
panic when it attempts to call schedule_work().

This pattern also occurs in mana_create_rxq().

> diff --git a/drivers/net/ethernet/microsoft/mana/mana_ethtool.c 
> b/drivers/net/ethernet/microsoft/mana/mana_ethtool.c
> index 94e658d07a27e..67a2b282ff4da 100644
> --- a/drivers/net/ethernet/microsoft/mana/mana_ethtool.c
> +++ b/drivers/net/ethernet/microsoft/mana/mana_ethtool.c

[ ... ]

> @@ -440,18 +474,123 @@ static int mana_set_coalesce(struct net_device *ndev,
> +             if (enable_rx_dim)
> +                     apc->rx_dim_enabled = true;
> +             if (enable_tx_dim)
> +                     apc->tx_dim_enabled = true;

[Severity: High]
Could the lack of memory barriers here expose uninitialized DIM structures
to NAPI?

The driver initializes DIM via mana_dim_change() and then sets
apc->rx_dim_enabled to true. Without an smp_store_release() here and a
corresponding smp_load_acquire() in mana_update_rx_dim(), weakly-ordered
CPUs like ARM64 might reorder the stores. Concurrently, NAPI polling might
observe the flag as true before the initialization is fully visible in memory,
potentially invoking net_dim() on garbage memory.

-- 
Sashiko AI review ยท 
https://sashiko.dev/#/patchset/[email protected]?part=1

Reply via email to