On Thu, 12 Mar 2026 23:01:11 +0100
Vincent Jardin <[email protected]> wrote:

> This series adds per-queue Tx data-rate limiting to the mlx5 PMD using
> hardware packet pacing (PP), and a symmetric rte_eth_get_queue_rate_limit()
> ethdev API to read back the configured rate.
> 
> Each Tx queue can be assigned an individual rate (in Mbps) at runtime via
> rte_eth_set_queue_rate_limit(). The mlx5 implementation allocates a
> dedicated PP index per rate from the HW rate table, programs it into the
> SQ via modify_sq, and shares identical rates across queues to conserve
> table entries. A PMD-specific API exposes per-queue PP diagnostics and
> rate table capacity.
> 
> Patch breakdown:
> 
>   1. doc/nics/mlx5: fix stale packet pacing documentation
>   2-3. common/mlx5: query PP capabilities and extend SQ modify
>   4-6. net/mlx5: per-queue PP infrastructure, rate_limit callback,
>        burst pacing devargs (tx_burst_bound, tx_typical_pkt_sz)
>   7. net/mlx5: testpmd command to query per-queue rate state
>   8. ethdev: add rte_eth_get_queue_rate_limit() symmetric getter
>        + testpmd "show port <id> queue <id> rate" command
>   9. net/mlx5: rate table capacity query API
> 
> Usage with testpmd:
>   set port 0 queue 0 rate 1000
>   set port 0 queue 1 rate 5000
>   set port 0 queue 0 rate 0      # disable
>   show port 0 queue 0 rate       # generic ethdev query
>   mlx5 port 0 txq 0 rate show    # mlx5 PMD-specific query
> 
> Changes since v2:
> 
> Patch 4 (per-queue packet pacing infrastructure):
>   - Folded "share pacing rate table entries across queues" into
>     this patch (was a separate patch in v2)
> 
> Patch 5 (support per-queue rate limiting):
>   - Remove redundant queue_idx >= nb_tx_queues check (ethdev
>     layer already validates before calling the PMD callback)
> 
> Patch 8 (ethdev getter):
>   - Add testpmd "show port <id> queue <id> rate" command
>     in app/test-pmd/cmdline.c using rte_eth_get_queue_rate_limit()
>   - Drop release notes (targeting 26.07, not 26.03)
>   - Remove redundant queue_idx bounds check from mlx5 getter
> 
> Patch 9 (rate table capacity query):
>   - Use MLX5_MEM_SYS flag in mlx5_malloc() for system memory
>   - Minor code style cleanups (line wrapping, cast formatting)
> 
> Changes since v1:
> 
> Addressed review feedback from Stephen Hemminger's AI:
> 
> Patch 4 (per-queue packet pacing infrastructure):
>   - Validate rate_mbps against HCA packet_pacing_min_rate and
>     packet_pacing_max_rate bounds; return -ERANGE on out-of-range
>   - Widen rate_kbps from uint32_t to uint64_t to prevent
>     overflow on rate_mbps * 1000
>   - Remove early mlx5_txq_free_pp_rate_limit() call from the
>     allocator (moved to caller, see patch 5)
> 
> Patch 5 (support per-queue rate limiting):
>   - Fix PP index leak on modify_sq failure: allocate new PP into a
>     temporary struct mlx5_txq_rate_limit; only swap into txq_ctrl->rl
>     after modify_sq succeeds. On failure the old PP context stays intact.
>   - Set rte_errno = -ret before returning errors from both the
>     disable (tx_rate=0) and enable paths
> 
> Patch 7 (testpmd command to query per-queue rate limit):
>   - Fix inverted rte_eth_tx_queue_is_valid() return value check:
>     was "if (rte_eth_tx_queue_is_valid(...))" (accepts invalid queues),
>     changed to "if (rte_eth_tx_queue_is_valid(...) != 0)"
> 
> Patch 9 (rate table capacity query, was patch 10 in v1):
>   - Replace uint16_t seen[RTE_MAX_QUEUES_PER_PORT] (2 KB stack array)
>     with heap-allocated mlx5_malloc(priv->txqs_n, ...) + mlx5_free()
>   - Add early return when txqs == NULL || txqs_n == 0
>   - Document in the API Doxygen that "used" reflects only the queried
>     port's queues; other ports on the same device may also consume
>     rate table entries
>   - Add -ENOMEM to documented return values
> 
> Hardware tested:
>   - ConnectX-6 Dx (packet pacing with MLX5_DATA_RATE)
> 
> Vincent Jardin (9):
>   doc/nics/mlx5: fix stale packet pacing documentation
>   common/mlx5: query packet pacing rate table capabilities
>   common/mlx5: extend SQ modify to support rate limit update
>   net/mlx5: add per-queue packet pacing infrastructure
>   net/mlx5: support per-queue rate limiting
>   net/mlx5: add burst pacing devargs
>   net/mlx5: add testpmd command to query per-queue rate limit
>   ethdev: add getter for per-queue Tx rate limit
>   net/mlx5: add rate table capacity query API
> 
>  app/test-pmd/cmdline.c               |  69 +++++++++++++
>  doc/guides/nics/mlx5.rst             | 125 ++++++++++++++++++------
>  drivers/common/mlx5/mlx5_devx_cmds.c |  20 ++++
>  drivers/common/mlx5/mlx5_devx_cmds.h |  14 ++-
>  drivers/net/mlx5/mlx5.c              |  46 +++++++++
>  drivers/net/mlx5/mlx5.h              |  13 +++
>  drivers/net/mlx5/mlx5_testpmd.c      |  93 ++++++++++++++++++
>  drivers/net/mlx5/mlx5_tx.c           | 104 +++++++++++++++++++-
>  drivers/net/mlx5/mlx5_tx.h           |   5 +
>  drivers/net/mlx5/mlx5_txpp.c         |  85 ++++++++++++++++
>  drivers/net/mlx5/mlx5_txq.c          | 141 +++++++++++++++++++++++++++
>  drivers/net/mlx5/rte_pmd_mlx5.h      |  62 ++++++++++++
>  lib/ethdev/ethdev_driver.h           |   7 ++
>  lib/ethdev/rte_ethdev.c              |  28 ++++++
>  lib/ethdev/rte_ethdev.h              |  24 +++++
>  15 files changed, 802 insertions(+), 33 deletions(-)
> 

Only minor things left to address.

Error (1):

Patch 5 — git bisect breakage with obj->sq vs obj->sq_obj.sq. Patch 5 accesses 
the SQ DevX object via txq_ctrl->obj->sq, which is the hairpin union member. 
For non-hairpin DevX/HWS queues the correct field is obj->sq_obj.sq. Patch 7 
fixes this, but patch 5 is broken as submitted — each commit needs to be 
independently correct for bisect. The sq_obj.sq change should be moved into 
patch 5.

Warnings (3):

Patch 8: mlx5_get_queue_rate_limit() doesn't bounds-check queue_idx before 
array access (the ethdev layer does, but the set path checks it too — 
inconsistent). Also missing release notes for the new ethdev API, and the new 
eth_dev_ops member needs ethdev maintainer agreement.
Patch 9: The "used" count is per-port but the API name suggests device-wide — 
consider renaming or iterating all ports on the shared context.

Reply via email to