From: Simon Horman <[email protected]>
Date: Fri,  8 May 2026 15:08:00 +0100

> From: 'Simon Horman' <[email protected]>
> 
> This is an AI-generated review of your patch. The human sending this
> email has considered the AI review valid, or at least plausible.
> Full review at: https://sashiko.dev
> ---
> ice: implement Rx queue management ops
> 
> This commit wires the ice driver into the netdev queue management ops
> framework, reusing the existing per-queue disable/reconfig/enable paths
> originally added for XSk. With it, ice reportedly passes the io_uring
> zcrx selftests.
> 
>> diff --git a/drivers/net/ethernet/intel/ice/ice_base.c 
>> b/drivers/net/ethernet/intel/ice/ice_base.c
>> index 1add82d894bb..4e0b8895c303 100644
>> --- a/drivers/net/ethernet/intel/ice/ice_base.c
>> +++ b/drivers/net/ethernet/intel/ice/ice_base.c
> 
> [ ... ]
> 
>> @@ -1577,3 +1593,81 @@ int ice_qp_ena(struct ice_vsi *vsi, u16 q_idx)
>>
>>      return fail;
>>  }
>> +
>> +/**
>> + * ice_qp_ena - Enables a queue pair
>> + * @vsi: VSI of interest
>> + * @q_idx: ring index in array
>> + *
>> + * Returns 0 on success, negative on failure.
>> + */
>> +int ice_qp_ena(struct ice_vsi *vsi, u16 q_idx)
>> +{
>> +    return __ice_qp_ena(vsi, q_idx, ice_vsi_cfg_single_rxq(vsi, q_idx));
>> +}
>> +
>> +static int ice_queue_start(struct net_device *dev,
>> +                       struct netdev_queue_config *qcfg,
>> +                       void *per_queue_mem, int idx)
>> +{
>> +    const struct ice_netdev_priv *priv = netdev_priv(dev);
>> +    struct ice_rx_ring *real = priv->vsi->rx_rings[idx];
>> +    struct ice_rx_ring *new = per_queue_mem;
>> +    struct napi_struct *napi;
>> +    int ret;
>> +
>> +    real->pp = new->pp;
>> +    real->rx_fqes = new->rx_fqes;
>> +    real->hdr_fqes = new->hdr_fqes;
>> +    real->hdr_pp = new->hdr_pp;
>> +
>> +    real->hdr_truesize = new->hdr_truesize;
>> +    real->truesize = new->truesize;
>> +    real->rx_hdr_len = new->rx_hdr_len;
>> +    real->rx_buf_len = new->rx_buf_len;
>> +
>> +    memcpy(&real->xdp_rxq, &new->xdp_rxq, sizeof(new->xdp_rxq));
>> +
>> +    ret = ice_setup_rx_ctx(real);
>> +    if (ret)
>> +            return ret;
> 
> Simon says: I believe that the issue flagged immediately below is
>             a false positive; the cleanup logic unwinds things correctly.

Yup, our Sashiko also had concerns about this piece, but I rechecked and
seems like it's a FP.

> 
>           However, the two other issues flagged further down do seem to be
>           of concern.
> 
> ice_queue_start() eagerly copies new->pp, new->rx_fqes, new->hdr_pp,
> new->hdr_fqes and the entire new->xdp_rxq into real before ice_setup_rx_ctx()
> has succeeded.

[...]

> 
>> +
>> +    napi = &real->q_vector->napi;
>> +
>> +    page_pool_enable_direct_recycling(real->pp, napi);
>> +    if (real->hdr_pp)
>> +            page_pool_enable_direct_recycling(real->hdr_pp, napi);
>> +
>> +    ret = ice_alloc_rx_bufs(real, ICE_DESC_UNUSED(real));
>> +
>> +    return __ice_qp_ena(priv->vsi, idx, ret);
> 
> ice_alloc_rx_bufs() is declared bool and returns !!cleaned_count, so it
> yields 0 on full success and 1 on partial/failed allocation:
> 
> bool ice_alloc_rx_bufs(struct ice_rx_ring *rx_ring, unsigned int 
> cleaned_count)
> {
>     ...
>     return !!cleaned_count;
> }
> 
> That bool is then passed straight into __ice_qp_ena()'s fail parameter,
> which only overwrites fail via 'if (!fail) fail = err;', so a value of 1
> propagates unchanged to ice_queue_start()'s return value.
> 
> netdev_rx_queue_reconfig() treats any non-zero return as failure and
> propagates it back to userspace (netlink, ethtool, io_uring zcrx install),
> where the callers expect a negative errno.
> 
> Before this patch, ice_qp_ena() obtained its fail value from
> ice_vsi_cfg_single_rxq(), which returns a signed errno.
> 
> Should this be converted to a proper negative errno before being handed
> to __ice_qp_ena(), for example:
> 
>       ret = ice_alloc_rx_bufs(real, ICE_DESC_UNUSED(real)) ? -ENOMEM : 0;
> 
>> +}
>> +
>> +static int ice_queue_stop(struct net_device *dev, void *per_queue_mem,
>> +                      int idx)
>> +{
>> +    const struct ice_netdev_priv *priv = netdev_priv(dev);
>> +    struct ice_rx_ring *real = priv->vsi->rx_rings[idx];
>> +    int ret;
>> +
>> +    ret = __ice_qp_dis(priv->vsi, idx);
>> +    if (ret)
>> +            return ret;
>> +
>> +    page_pool_disable_direct_recycling(real->pp);
>> +    if (real->hdr_pp)
>> +            page_pool_disable_direct_recycling(real->hdr_pp);
>> +
>> +    ice_zero_rx_ring(real);
>> +    memcpy(per_queue_mem, real, sizeof(*real));
>> +
>> +    return 0;
>> +}
> 
> Can this leak inflight page pool buffers?
> 
> ice_zero_rx_ring(real) is called before the memcpy to per_queue_mem, and
> ice_zero_rx_ring() resets both indices:
> 
> void ice_zero_rx_ring(struct ice_rx_ring *rx_ring)
> {
>     ...
>     rx_ring->next_to_clean = 0;
>     rx_ring->next_to_use = 0;
> }
> 
> So per_queue_mem captures a ring where next_to_clean == next_to_use == 0.
> 
> The core then invokes ndo_queue_mem_free(dev, old_mem), and the recycle
> loop in ice_queue_mem_free() is guarded by exactly those two indices:
> 
> void ice_queue_mem_free(struct net_device *dev, void *per_queue_mem)
> {
>     ...
>     for (u32 i = rx_ring->next_to_clean; i != rx_ring->next_to_use; ) {
>         libeth_rx_recycle_slow(rx_ring->rx_fqes[i].netmem);
> 
>         if (rx_ring->hdr_pp)
>             libeth_rx_recycle_slow(rx_ring->hdr_fqes[i].netmem);
> 
>         if (unlikely(++i == rx_ring->count))
>             i = 0;
>     }
>     ...
>     ice_rxq_pp_destroy(rx_ring);
> }
> 
> With 0 != 0 false on entry, the loop never runs, so any buffers that were
> in rx_fqes[old_ntc..old_ntu) (and hdr_fqes[] when header split is on) are
> never returned via libeth_rx_recycle_slow().  ice_rxq_pp_destroy() then
> kvfree()s the fqes arrays and calls page_pool_destroy() with outstanding
> inflight references that are no longer tracked anywhere.
> 
> The new ice_clean_rx_ring() wrapper keeps the original ordering:
> 
> void ice_clean_rx_ring(struct ice_rx_ring *rx_ring)
> {
>     ice_queue_mem_free(rx_ring->netdev, rx_ring);
>     ice_zero_rx_ring(rx_ring);
> }
> 
> Should ice_queue_stop() follow the same ordering and perform the memcpy
> before ice_zero_rx_ring(real), so the recycle loop in ice_queue_mem_free()
> sees the real ntc/ntu values?
> 
>> +
>> +const struct netdev_queue_mgmt_ops ice_queue_mgmt_ops = {
>> +    .ndo_queue_mem_alloc    = ice_queue_mem_alloc,
>> +    .ndo_queue_mem_free     = ice_queue_mem_free,
>> +    .ndo_queue_mem_size     = sizeof(struct ice_rx_ring),
>> +    .ndo_queue_start        = ice_queue_start,
>> +    .ndo_queue_stop         = ice_queue_stop,
>> +};

Those two are new to me, I'll double check.

Thanks,
Olek

Reply via email to