Nice clarification, Vincent. Thank you. Acked-by: Viacheslav Ovsiienko <[email protected]>
> -----Original Message----- > From: Vincent Jardin <[email protected]> > Sent: Tuesday, March 10, 2026 11:20 AM > To: [email protected] > Cc: Raslan Darawsheh <[email protected]>; NBU-Contact-Thomas Monjalon > (EXTERNAL) <[email protected]>; [email protected]; > Dariusz Sosnowski <[email protected]>; Slava Ovsiienko > <[email protected]>; Bing Zhao <[email protected]>; Ori Kam > <[email protected]>; Suanming Mou <[email protected]>; Matan Azrad > <[email protected]>; Vincent Jardin <[email protected]> > Subject: [PATCH v1 01/10] doc/nics/mlx5: fix stale packet pacing documentation > > The Tx Scheduling section incorrectly stated that timestamps can only be put > on > the first packet in a burst. The driver actually checks every packet's > ol_flags for > the timestamp dynamic flag and inserts a dedicated WAIT WQE per > timestamped packet. The eMPW path also breaks batches when a timestamped > packet is encountered. > > Additionally, the ConnectX-7+ wait-on-time capability was only briefly > mentioned in the tx_pp parameter section with no explanation of how it differs > from the ConnectX-6 Dx Clock Queue approach. > > This patch: > - Removes the stale first-packet-only limitation > - Documents both scheduling mechanisms (ConnectX-6 Dx Clock Queue and > ConnectX-7+ wait-on-time) with separate requirements tables > - Clarifies that tx_pp is specific to ConnectX-6 Dx > - Fixes tx_skew applicability to cover both hardware generations > - Updates the Send Scheduling Counters intro to reflect that timestamp > validation counters also apply to ConnectX-7+ wait-on-time mode > > Fixes: 8f848f32fc24 ("net/mlx5: introduce send scheduling devargs") > > Signed-off-by: Vincent Jardin <[email protected]> > --- > doc/guides/nics/mlx5.rst | 109 ++++++++++++++++++++++++++++----------- > 1 file changed, 78 insertions(+), 31 deletions(-) > > diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst index > 2529c2f4c8..5b097dbc90 100644 > --- a/doc/guides/nics/mlx5.rst > +++ b/doc/guides/nics/mlx5.rst > @@ -553,27 +553,32 @@ for an additional list of options shared with other > mlx5 drivers. > > - ``tx_pp`` parameter [int] > > + This parameter applies to **ConnectX-6 Dx** only. > If a nonzero value is specified the driver creates all necessary internal > - objects to provide accurate packet send scheduling on mbuf timestamps. > + objects (Clock Queue and Rearm Queue) to provide accurate packet send > + scheduling on mbuf timestamps using a cross-channel approach. > The positive value specifies the scheduling granularity in nanoseconds, > the packet send will be accurate up to specified digits. The allowed range > is > from 500 to 1 million of nanoseconds. The negative value specifies the > module > of granularity and engages the special test mode the check the schedule > rate. > By default (if the ``tx_pp`` is not specified) send scheduling on > timestamps > - feature is disabled. > + feature is disabled on ConnectX-6 Dx. > > - Starting with ConnectX-7 the capability to schedule traffic directly > - on timestamp specified in descriptor is provided, > - no extra objects are needed anymore and scheduling capability > - is advertised and handled regardless ``tx_pp`` parameter presence. > + Starting with **ConnectX-7** the hardware provides a native > + wait-on-time capability that inserts the scheduling delay directly in the > WQE > descriptor. > + No Clock Queue or Rearm Queue is needed and the ``tx_pp`` parameter > + is not required. The driver automatically advertises send scheduling > + support when the HCA wait-on-time capability is detected. The > + ``tx_skew`` parameter can still be used on ConnectX-7 and above to > compensate for wire delay. > > - ``tx_skew`` parameter [int] > > The parameter adjusts the send packet scheduling on timestamps and > represents > the average delay between beginning of the transmitting descriptor > processing > by the hardware and appearance of actual packet data on the wire. The value > - should be provided in nanoseconds and is valid only if ``tx_pp`` parameter > is > - specified. The default value is zero. > + should be provided in nanoseconds and applies to both ConnectX-6 Dx > + (with ``tx_pp``) and ConnectX-7+ (wait-on-time) scheduling modes. > + The default value is zero. > > - ``tx_vec_en`` parameter [int] > > @@ -883,9 +888,13 @@ Send Scheduling Counters > > The mlx5 PMD provides a comprehensive set of counters designed for > debugging and diagnostics related to packet scheduling during transmission. > -These counters are applicable only if the port was configured with the > ``tx_pp`` > devarg -and reflect the status of the PMD scheduling infrastructure -based on > Clock and Rearm Queues, used as a workaround on ConnectX-6 DX NICs. > +The first group of counters (prefixed ``tx_pp_``) reflects the status > +of the Clock Queue and Rearm Queue infrastructure used on ConnectX-6 Dx > +and is applicable only if the port was configured with the ``tx_pp`` devarg. > +The timestamp validation counters > +(``tx_pp_timestamp_past_errors``, ``tx_pp_timestamp_future_errors``, > +``tx_pp_timestamp_order_errors``) are also reported on ConnectX-7 and > +above in wait-on-time mode, without requiring ``tx_pp``. > > ``tx_pp_missed_interrupt_errors`` > Indicates that the Rearm Queue interrupt was not serviced on time. > @@ -1960,31 +1969,54 @@ Limitations > Tx Scheduling > ~~~~~~~~~~~~~ > > -When PMD sees the ``RTE_MBUF_DYNFLAG_TX_TIMESTAMP_NAME`` set on > the packet -being sent it tries to synchronize the time of packet appearing > on - > the wire with the specified packet timestamp. If the specified one -is in the > past it > should be ignored, if one is in the distant future -it should be capped with > some > reasonable value (in range of seconds). > -These specific cases ("too late" and "distant future") can be optionally - > reported via device xstats to assist applications to detect the -time-related > problems. > - > -The timestamp upper "too-distant-future" limit -at the moment of invoking the > Tx burst routine -can be estimated as ``tx_pp`` option (in nanoseconds) > multiplied by 2^23. > +When the PMD sees ``RTE_MBUF_DYNFLAG_TX_TIMESTAMP_NAME`` set on > a > +packet being sent it inserts a dedicated WAIT WQE to synchronize the > +time of the packet appearing on the wire with the specified timestamp. > +Every packet in a burst that carries the timestamp dynamic flag is > +individually scheduled -- there is no restriction to the first packet only. > + > +If the specified timestamp is in the past, the packet is sent immediately. > +If it is in the distant future it should be capped with some reasonable > +value (in range of seconds). These specific cases ("too late" and > +"distant future") can be optionally reported via device xstats to > +assist applications to detect time-related problems. > + > +The eMPW (enhanced Multi-Packet Write) data path automatically breaks > +the batch when a timestamped packet is encountered, ensuring each > +scheduled packet gets its own WAIT WQE. > + > +Two hardware mechanisms are supported: > + > +**ConnectX-6 Dx -- Clock Queue (cross-channel)** > + The driver creates a Clock Queue and a Rearm Queue that together > + provide a time reference for scheduling. This mode requires the > + :ref:`tx_pp <mlx5_tx_pp_param>` devarg. The timestamp upper > + "too-distant-future" limit at the moment of invoking the Tx burst > + routine can be estimated as ``tx_pp`` (in nanoseconds) multiplied > + by 2^23. > + > +**ConnectX-7 and above -- wait-on-time** > + The hardware supports placing the scheduling delay directly inside > + the WQE descriptor. No Clock Queue or Rearm Queue is needed and the > + ``tx_pp`` devarg is **not** required. The driver automatically > + advertises send scheduling support when the HCA wait-on-time > + capability is detected. > + > Please note, for the testpmd txonly mode, the limit is deduced from the > expression:: > > (n_tx_descriptors / burst_size + 1) * inter_burst_gap > > -There is no any packet reordering according timestamps is supposed, -neither > within packet burst, nor between packets, it is an entirely -application > responsibility to generate packets and its timestamps -in desired order. > +There is no packet reordering according to timestamps, neither within a > +packet burst, nor between packets. It is entirely the application's > +responsibility to generate packets and their timestamps in the desired > +order. > > Requirements > ^^^^^^^^^^^^ > > +ConnectX-6 Dx (Clock Queue mode): > + > ========= ============= > Minimum Version > ========= ============= > @@ -1996,20 +2028,35 @@ rdma-core > DPDK 20.08 > ========= ============= > > +ConnectX-7 and above (wait-on-time mode): > + > +========= ============= > +Minimum Version > +========= ============= > +hardware ConnectX-7 > +========= ============= > + > Firmware configuration > ^^^^^^^^^^^^^^^^^^^^^^ > > Runtime configuration > ^^^^^^^^^^^^^^^^^^^^^ > > -To provide the packet send scheduling on mbuf timestamps the ``tx_pp`` - > parameter should be specified. > +**ConnectX-6 Dx**: the :ref:`tx_pp <mlx5_tx_pp_param>` parameter must > +be specified to enable send scheduling on mbuf timestamps. > + > +**ConnectX-7+**: no devarg is required. Send scheduling is > +automatically enabled when the HCA reports the wait-on-time capability. > + > +On both hardware generations the ``tx_skew`` parameter can be used to > +compensate for the delay between descriptor processing and actual wire > +time. > > Limitations > ^^^^^^^^^^^ > > -#. The timestamps can be put only in the first packet > - in the burst providing the entire burst scheduling. > +#. On ConnectX-6 Dx (Clock Queue mode) timestamps too far in the future > + are capped (see the ``tx_pp`` x 2^23 limit above). > > > .. _mlx5_tx_inline: > -- > 2.43.0

