Nice clarification, Vincent.
Thank you.

Acked-by: Viacheslav Ovsiienko <[email protected]>

> -----Original Message-----
> From: Vincent Jardin <[email protected]>
> Sent: Tuesday, March 10, 2026 11:20 AM
> To: [email protected]
> Cc: Raslan Darawsheh <[email protected]>; NBU-Contact-Thomas Monjalon
> (EXTERNAL) <[email protected]>; [email protected];
> Dariusz Sosnowski <[email protected]>; Slava Ovsiienko
> <[email protected]>; Bing Zhao <[email protected]>; Ori Kam
> <[email protected]>; Suanming Mou <[email protected]>; Matan Azrad
> <[email protected]>; Vincent Jardin <[email protected]>
> Subject: [PATCH v1 01/10] doc/nics/mlx5: fix stale packet pacing documentation
> 
> The Tx Scheduling section incorrectly stated that timestamps can only be put 
> on
> the first packet in a burst. The driver actually checks every packet's 
> ol_flags for
> the timestamp dynamic flag and inserts a dedicated WAIT WQE per
> timestamped packet. The eMPW path also breaks batches when a timestamped
> packet is encountered.
> 
> Additionally, the ConnectX-7+ wait-on-time capability was only briefly
> mentioned in the tx_pp parameter section with no explanation of how it differs
> from the ConnectX-6 Dx Clock Queue approach.
> 
> This patch:
> - Removes the stale first-packet-only limitation
> - Documents both scheduling mechanisms (ConnectX-6 Dx Clock Queue and
>   ConnectX-7+ wait-on-time) with separate requirements tables
> - Clarifies that tx_pp is specific to ConnectX-6 Dx
> - Fixes tx_skew applicability to cover both hardware generations
> - Updates the Send Scheduling Counters intro to reflect that timestamp
>   validation counters also apply to ConnectX-7+ wait-on-time mode
> 
> Fixes: 8f848f32fc24 ("net/mlx5: introduce send scheduling devargs")
> 
> Signed-off-by: Vincent Jardin <[email protected]>
> ---
>  doc/guides/nics/mlx5.rst | 109 ++++++++++++++++++++++++++++-----------
>  1 file changed, 78 insertions(+), 31 deletions(-)
> 
> diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst index
> 2529c2f4c8..5b097dbc90 100644
> --- a/doc/guides/nics/mlx5.rst
> +++ b/doc/guides/nics/mlx5.rst
> @@ -553,27 +553,32 @@ for an additional list of options shared with other
> mlx5 drivers.
> 
>  - ``tx_pp`` parameter [int]
> 
> +  This parameter applies to **ConnectX-6 Dx** only.
>    If a nonzero value is specified the driver creates all necessary internal
> -  objects to provide accurate packet send scheduling on mbuf timestamps.
> +  objects (Clock Queue and Rearm Queue) to provide accurate packet send
> + scheduling on mbuf timestamps using a cross-channel approach.
>    The positive value specifies the scheduling granularity in nanoseconds,
>    the packet send will be accurate up to specified digits. The allowed range 
> is
>    from 500 to 1 million of nanoseconds. The negative value specifies the 
> module
>    of granularity and engages the special test mode the check the schedule 
> rate.
>    By default (if the ``tx_pp`` is not specified) send scheduling on 
> timestamps
> -  feature is disabled.
> +  feature is disabled on ConnectX-6 Dx.
> 
> -  Starting with ConnectX-7 the capability to schedule traffic directly
> -  on timestamp specified in descriptor is provided,
> -  no extra objects are needed anymore and scheduling capability
> -  is advertised and handled regardless ``tx_pp`` parameter presence.
> +  Starting with **ConnectX-7** the hardware provides a native
> + wait-on-time  capability that inserts the scheduling delay directly in the 
> WQE
> descriptor.
> +  No Clock Queue or Rearm Queue is needed and the ``tx_pp`` parameter
> + is not  required. The driver automatically advertises send scheduling
> + support when  the HCA wait-on-time capability is detected. The
> + ``tx_skew`` parameter can  still be used on ConnectX-7 and above to
> compensate for wire delay.
> 
>  - ``tx_skew`` parameter [int]
> 
>    The parameter adjusts the send packet scheduling on timestamps and
> represents
>    the average delay between beginning of the transmitting descriptor 
> processing
>    by the hardware and appearance of actual packet data on the wire. The value
> -  should be provided in nanoseconds and is valid only if ``tx_pp`` parameter 
> is
> -  specified. The default value is zero.
> +  should be provided in nanoseconds and applies to both ConnectX-6 Dx
> + (with ``tx_pp``) and ConnectX-7+ (wait-on-time) scheduling modes.
> +  The default value is zero.
> 
>  - ``tx_vec_en`` parameter [int]
> 
> @@ -883,9 +888,13 @@ Send Scheduling Counters
> 
>  The mlx5 PMD provides a comprehensive set of counters designed for
> debugging and diagnostics related to packet scheduling during transmission.
> -These counters are applicable only if the port was configured with the 
> ``tx_pp``
> devarg -and reflect the status of the PMD scheduling infrastructure -based on
> Clock and Rearm Queues, used as a workaround on ConnectX-6 DX NICs.
> +The first group of counters (prefixed ``tx_pp_``) reflects the status
> +of the Clock Queue and Rearm Queue infrastructure used on ConnectX-6 Dx
> +and is applicable only if the port was configured with the ``tx_pp`` devarg.
> +The timestamp validation counters
> +(``tx_pp_timestamp_past_errors``, ``tx_pp_timestamp_future_errors``,
> +``tx_pp_timestamp_order_errors``) are also reported on ConnectX-7 and
> +above in wait-on-time mode, without requiring ``tx_pp``.
> 
>  ``tx_pp_missed_interrupt_errors``
>    Indicates that the Rearm Queue interrupt was not serviced on time.
> @@ -1960,31 +1969,54 @@ Limitations
>  Tx Scheduling
>  ~~~~~~~~~~~~~
> 
> -When PMD sees the ``RTE_MBUF_DYNFLAG_TX_TIMESTAMP_NAME`` set on
> the packet -being sent it tries to synchronize the time of packet appearing 
> on -
> the wire with the specified packet timestamp. If the specified one -is in the 
> past it
> should be ignored, if one is in the distant future -it should be capped with 
> some
> reasonable value (in range of seconds).
> -These specific cases ("too late" and "distant future") can be optionally -
> reported via device xstats to assist applications to detect the -time-related
> problems.
> -
> -The timestamp upper "too-distant-future" limit -at the moment of invoking the
> Tx burst routine -can be estimated as ``tx_pp`` option (in nanoseconds)
> multiplied by 2^23.
> +When the PMD sees ``RTE_MBUF_DYNFLAG_TX_TIMESTAMP_NAME`` set on
> a
> +packet being sent it inserts a dedicated WAIT WQE to synchronize the
> +time of the packet appearing on the wire with the specified timestamp.
> +Every packet in a burst that carries the timestamp dynamic flag is
> +individually scheduled -- there is no restriction to the first packet only.
> +
> +If the specified timestamp is in the past, the packet is sent immediately.
> +If it is in the distant future it should be capped with some reasonable
> +value (in range of seconds). These specific cases ("too late" and
> +"distant future") can be optionally reported via device xstats to
> +assist applications to detect time-related problems.
> +
> +The eMPW (enhanced Multi-Packet Write) data path automatically breaks
> +the batch when a timestamped packet is encountered, ensuring each
> +scheduled packet gets its own WAIT WQE.
> +
> +Two hardware mechanisms are supported:
> +
> +**ConnectX-6 Dx -- Clock Queue (cross-channel)**
> +   The driver creates a Clock Queue and a Rearm Queue that together
> +   provide a time reference for scheduling. This mode requires the
> +   :ref:`tx_pp <mlx5_tx_pp_param>` devarg. The timestamp upper
> +   "too-distant-future" limit at the moment of invoking the Tx burst
> +   routine can be estimated as ``tx_pp`` (in nanoseconds) multiplied
> +   by 2^23.
> +
> +**ConnectX-7 and above -- wait-on-time**
> +   The hardware supports placing the scheduling delay directly inside
> +   the WQE descriptor. No Clock Queue or Rearm Queue is needed and the
> +   ``tx_pp`` devarg is **not** required. The driver automatically
> +   advertises send scheduling support when the HCA wait-on-time
> +   capability is detected.
> +
>  Please note, for the testpmd txonly mode,  the limit is deduced from the
> expression::
> 
>     (n_tx_descriptors / burst_size + 1) * inter_burst_gap
> 
> -There is no any packet reordering according timestamps is supposed, -neither
> within packet burst, nor between packets, it is an entirely -application
> responsibility to generate packets and its timestamps -in desired order.
> +There is no packet reordering according to timestamps, neither within a
> +packet burst, nor between packets. It is entirely the application's
> +responsibility to generate packets and their timestamps in the desired
> +order.
> 
>  Requirements
>  ^^^^^^^^^^^^
> 
> +ConnectX-6 Dx (Clock Queue mode):
> +
>  =========  =============
>  Minimum    Version
>  =========  =============
> @@ -1996,20 +2028,35 @@ rdma-core
>  DPDK       20.08
>  =========  =============
> 
> +ConnectX-7 and above (wait-on-time mode):
> +
> +=========  =============
> +Minimum    Version
> +=========  =============
> +hardware   ConnectX-7
> +=========  =============
> +
>  Firmware configuration
>  ^^^^^^^^^^^^^^^^^^^^^^
> 
>  Runtime configuration
>  ^^^^^^^^^^^^^^^^^^^^^
> 
> -To provide the packet send scheduling on mbuf timestamps the ``tx_pp`` -
> parameter should be specified.
> +**ConnectX-6 Dx**: the :ref:`tx_pp <mlx5_tx_pp_param>` parameter must
> +be specified to enable send scheduling on mbuf timestamps.
> +
> +**ConnectX-7+**: no devarg is required. Send scheduling is
> +automatically enabled when the HCA reports the wait-on-time capability.
> +
> +On both hardware generations the ``tx_skew`` parameter can be used to
> +compensate for the delay between descriptor processing and actual wire
> +time.
> 
>  Limitations
>  ^^^^^^^^^^^
> 
> -#. The timestamps can be put only in the first packet
> -   in the burst providing the entire burst scheduling.
> +#. On ConnectX-6 Dx (Clock Queue mode) timestamps too far in the future
> +   are capped (see the ``tx_pp`` x 2^23 limit above).
> 
> 
>  .. _mlx5_tx_inline:
> --
> 2.43.0

Reply via email to