On Sat, 9 May 2026 23:56:51 +0200
Thomas Monjalon <[email protected]> wrote:
> This is a new feature in ethdev with tests and mlx5 implementation.
> Selective Rx allows to receive partial data,
> saving some hardware bandwidth.
>
> Note 1: mlx5 support patch is not correctly indented
> to make review easier. An indent patch follows to be squashed.
>
> Note 2: DTS patch is an attempt to test the feature on day 1,
> it is not mandatory if it is blocking the merge.
Larger patch needed bigger AI model than default CI.
On v2 of the selective Rx series:
[PATCH v2 03/10] app/testpmd: support selective Rx
Warning: regression for the existing "--rxpkts only" case.
The new condition
if (rx_pkt_nb_segs > 1 || rx_pkt_nb_offs > 0)
enters the new path even when rx_pkt_nb_offs == 0, which used to
go through "rx_seg->offset = i < rx_pkt_nb_offs ? ... : 0;".
In the new code, seg_offset falls back to next_offset:
seg_offset = i < rx_pkt_nb_offs ?
rx_pkt_seg_offsets[i] : next_offset;
...
rx_seg->offset = seg_offset;
next_offset = seg_offset + rx_seg->length;
With "--rxpkts=14,64" (nb_offs=0), iteration 0 gives offset=0
length=14 next_offset=14; iteration 1 gives seg_offset=14,
so rx_seg[1].offset = 14. Old code set offset=0 for both.
Per the commit message, the new path is intended only when
rxoffs and rxpkts are both specified. Either restrict the
condition to rx_pkt_nb_offs > 0, or keep the old offset
assignment for the no-offsets case, e.g.:
seg_offset = i < rx_pkt_nb_offs ? rx_pkt_seg_offsets[i] :
(rx_pkt_nb_offs > 0 ? next_offset : 0);
Info: the trailing discard length
rx_seg->length = dev_info.max_rx_pktlen - next_offset;
is uint32_t-uint16_t assigned to a uint16_t length field.
Truncates if max_rx_pktlen exceeds 65535. Edge case but
worth a clamp.
[PATCH v2 06/10] net/mlx5: support selective Rx
Error: NULL pointer dereference in mlx5_rx_burst() when the
first Rx segment has a NULL mempool.
The CQE polling and pkt assignment are now nested inside
"if (seg->pool) { ... }":
if (seg->pool) {
rep = rte_mbuf_raw_alloc(seg->pool);
...
if (!pkt) {
cqe = ...
len = mlx5_rx_poll_len(...);
...
pkt = seg;
...
PKT_LEN(pkt) = len;
...
}
...
}
if (len > DATA_LEN(seg)) {
...
}
if (seg->pool) {
DATA_LEN(seg) = len;
data_seg_len += len;
}
PKT_LEN(pkt) = RTE_MIN(PKT_LEN(pkt), data_seg_len);
When rxq elt 0 is a null mbuf (seg->pool == NULL), the entire
if (seg->pool) block is skipped: pkt stays NULL, len stays 0.
len > DATA_LEN(seg) is false (DATA_LEN was set to seg->length
in rxq_alloc_elts_sprq). Control falls through to
"PKT_LEN(pkt) = RTE_MIN(...)" with pkt == NULL.
This is reachable from testpmd "--rxoffs=N --rxpkts=M" with
N > 0, which produces [discard, real, discard]. mlx5_rxq_new
only rejects configurations where every segment has NULL mp;
it does not reject first-null configurations. Patch 10's
selective_rx_payload_only and selective_rx_two_segments tests
both produce first-null layouts.
Options: reject first-null configurations in mlx5_rxq_new(),
or hoist the CQE-polling block out of "if (seg->pool)" so
pkt and len are established regardless, and guard the trailing
PKT_LEN/stats/data_seg_len reset with "if (pkt)".
Info: the rx_seg->offset semantics for non-discard segments
with selective Rx active are not documented anywhere added in
patch 02. The struct comment still reads "Data offset from
beginning of mbuf data buffer". With patch 03 emitting
cumulative wire offsets into rx_seg->offset for real segments
(e.g. rx_seg[2].offset = 128 for "--rxoffs=0,128 --rxpkts=14,64"),
mbuf data is placed at +128 in the data buffer, wasting the
first 128 bytes of headroom. If that placement is intentional,
the struct doc should say so; if not, real segments need
offset=0 and only discard segments carry wire-position info.
Note: patch 07 must be squashed into 06 before merging, as the
commit message states. The bug above persists through 07 (same
code, just reindented).