On Tue, 2 Jun 2026 23:49:12 +0200
Thomas Monjalon <[email protected]> wrote:
> This is a new feature in ethdev with tests and mlx5 implementation.
> Selective Rx allows to receive partial data,
> saving some hardware bandwidth.
>
> Note 1: mlx5 support patch is not correctly indented
> to make review easier. An indent patch follows to be squashed.
>
> Note 2: DTS patch is an attempt to test the feature on day 1,
> it is not mandatory if it is blocking the merge.
>
> v2: rework after Gregory
> v3: fix bugs found with AI by Stephen
> v4: fix packet type in DTS test
> v5: fix mlx5 Rx to handle discarding first segment
> v6: fix reindent patch
>
>
> Gregory Etelson (4):
> ethdev: introduce selective Rx
> app/testpmd: support selective Rx
> common/mlx5: add null MR functions
> net/mlx5: support selective Rx
>
> Thomas Monjalon (6):
> app/testpmd: print Rx split capabilities
> net/mlx5: fix Rx split segment counter type
> net/mlx5: reindent previous changes
> common/mlx5: remove callbacks for MR registration
> dts: fix topology capability comparison
> dts: add selective Rx tests
>
> app/test-pmd/config.c | 17 ++
> app/test-pmd/testpmd.c | 71 ++++-
> devtools/libabigail.abignore | 7 +
> doc/guides/nics/features.rst | 14 +
> doc/guides/nics/features/default.ini | 1 +
> doc/guides/nics/features/mlx5.ini | 1 +
> doc/guides/nics/mlx5.rst | 86 ++++--
> doc/guides/rel_notes/release_26_07.rst | 11 +
> doc/guides/testpmd_app_ug/run_app.rst | 20 ++
> drivers/common/mlx5/linux/mlx5_common_verbs.c | 53 ++--
> drivers/common/mlx5/mlx5_common.c | 6 +-
> drivers/common/mlx5/mlx5_common_mr.c | 37 ++-
> drivers/common/mlx5/mlx5_common_mr.h | 29 +-
> drivers/common/mlx5/windows/mlx5_common_os.c | 31 ++-
> drivers/compress/mlx5/mlx5_compress.c | 4 +-
> drivers/crypto/mlx5/mlx5_crypto.h | 2 -
> drivers/crypto/mlx5/mlx5_crypto_gcm.c | 6 +-
> drivers/net/mlx5/mlx5.c | 7 +
> drivers/net/mlx5/mlx5.h | 4 +-
> drivers/net/mlx5/mlx5_ethdev.c | 25 ++
> drivers/net/mlx5/mlx5_flow_aso.c | 21 +-
> drivers/net/mlx5/mlx5_flow_hw.c | 11 +-
> drivers/net/mlx5/mlx5_flow_quota.c | 6 +-
> drivers/net/mlx5/mlx5_hws_cnt.c | 19 +-
> drivers/net/mlx5/mlx5_rx.c | 186 ++++++++-----
> drivers/net/mlx5/mlx5_rx.h | 5 +-
> drivers/net/mlx5/mlx5_rxq.c | 75 +++--
> drivers/net/mlx5/mlx5_trigger.c | 64 ++++-
> dts/api/capabilities.py | 2 +
> dts/api/testpmd/__init__.py | 17 ++
> dts/api/testpmd/types.py | 6 +
> dts/framework/testbed_model/capability.py | 10 +-
> dts/tests/TestSuite_rx_split.py | 262 ++++++++++++++++++
> lib/ethdev/rte_ethdev.c | 24 +-
> lib/ethdev/rte_ethdev.h | 14 +-
> 35 files changed, 880 insertions(+), 274 deletions(-)
> create mode 100644 dts/tests/TestSuite_rx_split.py
>
Still has some issues:
Patch 6: net/mlx5: support selective Rx
Error: after a non-critical Rx error CQE, the next packet is processed with
a stale length and stale CQE; the queue effectively wedges.
To avoid re-polling the CQE on each leading discard segment of one packet,
v6 wraps the poll in "if (len == 0)" and resets len to 0 at the points where
a packet finishes. Two of the three exits reset it (normal completion and
the "no real segment found" skip both do "len = 0;"), but the non-critical
error-CQE path does not:
if (unlikely(len & MLX5_ERROR_CQE_MASK)) {
if (seg->pool)
rte_mbuf_raw_free(rep);
if (len == MLX5_CRITICAL_ERROR_CQE_RET) {
rq_ci = rxq->rq_ci << sges_n;
break;
}
rq_ci >>= sges_n;
rq_ci += skip_cnt;
rq_ci <<= sges_n;
MLX5_ASSERT(!pkt);
continue; /* len still == MLX5_ERROR_CQE_MASK
(0x40000000) */
}
MLX5_ERROR_CQE_MASK is 0x40000000, so len is non-zero on this continue. The
next iteration hits "if (len == 0)" == false and skips mlx5_rx_poll_len()
entirely. The following real segment then sets pkt with PKT_LEN(pkt) ==
0x40000000 and rxq_cq_to_mbuf() reads the stale cqe/mcqe. Because
0x40000000 > DATA_LEN(seg), the "more data" branch keeps consuming
descriptors as one bogus giant packet, walking the whole ring and emitting
nothing. Pre-v6 this worked because the poll was unconditional in the !pkt
block.
Suggested fix: reset len before the error-skip continue, matching the other
two exits:
rq_ci >>= sges_n;
rq_ci += skip_cnt;
rq_ci <<= sges_n;
MLX5_ASSERT(!pkt);
len = 0;
continue;
Minor: "tail = seg;" is now set in both the "first real segment" block and
the "real segment: replenish WQE" block; the first is redundant since the
second always runs for the same segment. Harmless, but the duplicate can be
dropped.