Wire rte_eth_set_queue_rate_limit() to the mlx5 PMD. The callback
allocates a per-queue PP index with the requested data rate, then
modifies the live SQ via modify_bitmask bit 0 to apply the new
packet_pacing_rate_limit_index — no queue teardown required.

Setting tx_rate=0 clears the PP index on the SQ and frees it.

Capability check uses hca_attr.qos.packet_pacing directly (not
dev_cap.txpp_en which requires Clock Queue prerequisites). This
allows per-queue rate limiting without the tx_pp devarg.

The callback rejects hairpin queues and queues whose SQ is not
yet created.

testpmd usage (no testpmd changes needed):
  set port 0 queue 0 rate 1000
  set port 0 queue 1 rate 5000
  set port 0 queue 0 rate 0     # disable

Supported hardware:
- ConnectX-6 Dx: full support, per-SQ rate via HW rate table
- ConnectX-7/8: full support, coexists with wait-on-time scheduling
- BlueField-2/3: full support as DPU rep ports

Not supported:
- ConnectX-5: packet_pacing exists but dynamic SQ modify may not
  work on all firmware versions
- ConnectX-4 Lx and earlier: no packet_pacing capability

Signed-off-by: Vincent Jardin <[email protected]>
---
 doc/guides/nics/features/mlx5.ini |   1 +
 doc/guides/nics/mlx5.rst          |  54 ++++++++++++++
 drivers/net/mlx5/mlx5.c           |   2 +
 drivers/net/mlx5/mlx5_tx.h        |   2 +
 drivers/net/mlx5/mlx5_txq.c       | 118 ++++++++++++++++++++++++++++++
 5 files changed, 177 insertions(+)

diff --git a/doc/guides/nics/features/mlx5.ini 
b/doc/guides/nics/features/mlx5.ini
index 4f9c4c309b..3b3eda28b8 100644
--- a/doc/guides/nics/features/mlx5.ini
+++ b/doc/guides/nics/features/mlx5.ini
@@ -30,6 +30,7 @@ Inner RSS            = Y
 SR-IOV               = Y
 VLAN filter          = Y
 Flow control         = Y
+Rate limitation      = Y
 CRC offload          = Y
 VLAN offload         = Y
 L3 checksum offload  = Y
diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index 6bb8c07353..c72a60f084 100644
--- a/doc/guides/nics/mlx5.rst
+++ b/doc/guides/nics/mlx5.rst
@@ -580,6 +580,60 @@ for an additional list of options shared with other mlx5 
drivers.
   (with ``tx_pp``) and ConnectX-7+ (wait-on-time) scheduling modes.
   The default value is zero.
 
+.. _mlx5_per_queue_rate_limit:
+
+Per-Queue Tx Rate Limiting
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The mlx5 PMD supports per-queue Tx rate limiting via the standard ethdev
+API ``rte_eth_set_queue_rate_limit()`` and ``rte_eth_get_queue_rate_limit()``.
+
+This feature uses the hardware packet pacing mechanism to enforce a data
+rate on individual TX queues without tearing down the queue. The rate is
+specified in Mbps.
+
+**Requirements:**
+
+- ConnectX-6 Dx or later with ``packet_pacing`` HCA capability.
+- The DevX path must be used (default). The legacy Verbs path
+  (``dv_flow_en=0``) does not support dynamic SQ modification and
+  returns ``-EINVAL``.
+- The queue must be started (SQ in RDY state) before setting a rate.
+
+**Supported hardware:**
+
+- ConnectX-6 Dx: per-SQ rate via HW rate table.
+- ConnectX-7/8: full support, coexists with wait-on-time scheduling.
+- BlueField-2/3: full support as DPU rep ports.
+
+**Not supported:**
+
+- ConnectX-5: ``packet_pacing`` exists but dynamic SQ modify may not
+  work on all firmware versions.
+- ConnectX-4 Lx and earlier: no ``packet_pacing`` capability.
+
+**Rate table sharing:**
+
+The hardware rate table has a limited number of entries (typically 128 on
+ConnectX-6 Dx). When multiple queues are configured with identical rate
+parameters, the kernel mlx5 driver shares a single rate table entry across
+them. Each queue still has its own independent SQ and enforces the rate
+independently — queues are never merged. The rate cap applies per-queue:
+if two queues share the same 1000 Mbps entry, each can send up to
+1000 Mbps independently, they do not share a combined budget.
+
+This sharing is transparent and only affects table capacity: 128 entries
+can serve thousands of queues as long as many use the same rate. Queues
+with different rates consume separate entries.
+
+**Usage with testpmd:**
+
+.. code-block:: console
+
+   testpmd> set port 0 queue 0 rate 1000
+   testpmd> show port 0 queue 0 rate
+   testpmd> set port 0 queue 0 rate 0
+
 - ``tx_vec_en`` parameter [int]
 
   A nonzero value enables Tx vector with ConnectX-5 NICs and above.
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index e795948187..e718f0fa8c 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -2621,6 +2621,7 @@ const struct eth_dev_ops mlx5_dev_ops = {
        .map_aggr_tx_affinity = mlx5_map_aggr_tx_affinity,
        .rx_metadata_negotiate = mlx5_flow_rx_metadata_negotiate,
        .get_restore_flags = mlx5_get_restore_flags,
+       .set_queue_rate_limit = mlx5_set_queue_rate_limit,
 };
 
 /* Available operations from secondary process. */
@@ -2714,6 +2715,7 @@ const struct eth_dev_ops mlx5_dev_ops_isolate = {
        .count_aggr_ports = mlx5_count_aggr_ports,
        .map_aggr_tx_affinity = mlx5_map_aggr_tx_affinity,
        .get_restore_flags = mlx5_get_restore_flags,
+       .set_queue_rate_limit = mlx5_set_queue_rate_limit,
 };
 
 /**
diff --git a/drivers/net/mlx5/mlx5_tx.h b/drivers/net/mlx5/mlx5_tx.h
index 51f330454a..975ff57acd 100644
--- a/drivers/net/mlx5/mlx5_tx.h
+++ b/drivers/net/mlx5/mlx5_tx.h
@@ -222,6 +222,8 @@ struct mlx5_txq_ctrl *mlx5_txq_get(struct rte_eth_dev *dev, 
uint16_t idx);
 int mlx5_txq_release(struct rte_eth_dev *dev, uint16_t idx);
 int mlx5_txq_releasable(struct rte_eth_dev *dev, uint16_t idx);
 int mlx5_txq_verify(struct rte_eth_dev *dev);
+int mlx5_set_queue_rate_limit(struct rte_eth_dev *dev, uint16_t queue_idx,
+                             uint32_t tx_rate);
 int mlx5_txq_get_sqn(struct mlx5_txq_ctrl *txq);
 void mlx5_txq_alloc_elts(struct mlx5_txq_ctrl *txq_ctrl);
 void mlx5_txq_free_elts(struct mlx5_txq_ctrl *txq_ctrl);
diff --git a/drivers/net/mlx5/mlx5_txq.c b/drivers/net/mlx5/mlx5_txq.c
index 3356c89758..ce08363ca9 100644
--- a/drivers/net/mlx5/mlx5_txq.c
+++ b/drivers/net/mlx5/mlx5_txq.c
@@ -1363,6 +1363,124 @@ mlx5_txq_release(struct rte_eth_dev *dev, uint16_t idx)
        return 0;
 }
 
+/**
+ * Set per-queue packet pacing rate limit.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param queue_idx
+ *   TX queue index.
+ * @param tx_rate
+ *   TX rate in Mbps, 0 to disable rate limiting.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_set_queue_rate_limit(struct rte_eth_dev *dev, uint16_t queue_idx,
+                         uint32_t tx_rate)
+{
+       struct mlx5_priv *priv = dev->data->dev_private;
+       struct mlx5_dev_ctx_shared *sh = priv->sh;
+       struct mlx5_txq_ctrl *txq_ctrl;
+       struct mlx5_devx_obj *sq_devx;
+       struct mlx5_devx_modify_sq_attr sq_attr = { 0 };
+       struct mlx5_txq_rate_limit new_rate_limit = { 0 };
+       int ret;
+
+       if (!sh->cdev->config.hca_attr.qos.packet_pacing) {
+               DRV_LOG(ERR, "Port %u packet pacing not supported.",
+                       dev->data->port_id);
+               rte_errno = ENOTSUP;
+               return -rte_errno;
+       }
+       if (priv->txqs == NULL || (*priv->txqs)[queue_idx] == NULL) {
+               DRV_LOG(ERR, "Port %u Tx queue %u not configured.",
+                       dev->data->port_id, queue_idx);
+               rte_errno = EINVAL;
+               return -rte_errno;
+       }
+       txq_ctrl = container_of((*priv->txqs)[queue_idx],
+                               struct mlx5_txq_ctrl, txq);
+       if (txq_ctrl->is_hairpin) {
+               DRV_LOG(ERR, "Port %u Tx queue %u is hairpin.",
+                       dev->data->port_id, queue_idx);
+               rte_errno = EINVAL;
+               return -rte_errno;
+       }
+       if (txq_ctrl->obj == NULL) {
+               DRV_LOG(ERR, "Port %u Tx queue %u not initialized.",
+                       dev->data->port_id, queue_idx);
+               rte_errno = EINVAL;
+               return -rte_errno;
+       }
+       /*
+        * For non-hairpin queues the SQ DevX object lives in
+        * obj->sq_obj.sq (used by DevX/HWS mode), while hairpin
+        * queues use obj->sq directly. These are different members
+        * of a union inside mlx5_txq_obj.
+        */
+       sq_devx = txq_ctrl->obj->sq_obj.sq;
+       if (sq_devx == NULL) {
+               DRV_LOG(ERR, "Port %u Tx queue %u SQ not ready.",
+                       dev->data->port_id, queue_idx);
+               rte_errno = EINVAL;
+               return -rte_errno;
+       }
+       if (dev->data->tx_queue_state[queue_idx] !=
+           RTE_ETH_QUEUE_STATE_STARTED) {
+               DRV_LOG(ERR,
+                       "Port %u Tx queue %u is not started, stop traffic 
before setting rate.",
+                       dev->data->port_id, queue_idx);
+               rte_errno = EINVAL;
+               return -rte_errno;
+       }
+       if (tx_rate == 0) {
+               /* Disable rate limiting. */
+               if (txq_ctrl->rate_limit.pp_id == 0)
+                       return 0; /* Already disabled. */
+               sq_attr.sq_state = MLX5_SQC_STATE_RDY;
+               sq_attr.state = MLX5_SQC_STATE_RDY;
+               sq_attr.rl_update = 1;
+               sq_attr.packet_pacing_rate_limit_index = 0;
+               ret = mlx5_devx_cmd_modify_sq(sq_devx, &sq_attr);
+               if (ret) {
+                       DRV_LOG(ERR,
+                               "Port %u Tx queue %u failed to clear rate.",
+                               dev->data->port_id, queue_idx);
+                       rte_errno = -ret;
+                       return ret;
+               }
+               mlx5_txq_free_pp_rate_limit(&txq_ctrl->rate_limit);
+               DRV_LOG(DEBUG, "Port %u Tx queue %u rate limit disabled.",
+                       dev->data->port_id, queue_idx);
+               return 0;
+       }
+       /* Allocate a new PP index for the requested rate into a temp. */
+       ret = mlx5_txq_alloc_pp_rate_limit(sh, &new_rate_limit, tx_rate);
+       if (ret)
+               return ret;
+       /* Modify live SQ to use the new PP index. */
+       sq_attr.sq_state = MLX5_SQC_STATE_RDY;
+       sq_attr.state = MLX5_SQC_STATE_RDY;
+       sq_attr.rl_update = 1;
+       sq_attr.packet_pacing_rate_limit_index = new_rate_limit.pp_id;
+       ret = mlx5_devx_cmd_modify_sq(sq_devx, &sq_attr);
+       if (ret) {
+               DRV_LOG(ERR, "Port %u Tx queue %u failed to set rate %u Mbps.",
+                       dev->data->port_id, queue_idx, tx_rate);
+               mlx5_txq_free_pp_rate_limit(&new_rate_limit);
+               rte_errno = -ret;
+               return ret;
+       }
+       /* SQ updated — release old PP context, install new one. */
+       mlx5_txq_free_pp_rate_limit(&txq_ctrl->rate_limit);
+       txq_ctrl->rate_limit = new_rate_limit;
+       DRV_LOG(DEBUG, "Port %u Tx queue %u rate set to %u Mbps (PP idx %u).",
+               dev->data->port_id, queue_idx, tx_rate, 
txq_ctrl->rate_limit.pp_id);
+       return 0;
+}
+
 /**
  * Verify if the queue can be released.
  *
-- 
2.43.0

Reply via email to