On Fri, 29 May 2026 15:34:00 +0200
Thomas Monjalon <[email protected]> wrote:

> From: Gregory Etelson <[email protected]>
> 
> Selective Rx may save some PCI bandwidth.
> Implement selective Rx in the (quite slow) scalar SPRQ Rx path
> mlx5_rx_burst() where the performance impact
> of the added condition branches is acceptable.
> Other Rx functions do not support this feature.
> When using selective Rx, mlx5_rx_burst will be selected.
> 
> A null Memory Region (MR) is always allocated
> at shared device context initialization.
> The selective Rx capability is not advertised
> if this special MR allocation fails.
> 
> For each Rx segment configured with a NULL mempool,
> a "null mbuf" is created.
> It is a fake mbuf allocated outside any mempool,
> used as a placeholder in the Rx ring.
> The null MR lkey is used in the WQE for these segments
> so the NIC writes received data to a discard buffer.
> The mbuf data room size is resolved from the first segment having a pool.
> For null segments, the buffer length is from the last seen pool,
> so that the WQE stride size remains consistent.
> 
> In mlx5_rx_burst, discarded segments are not chained
> into the packet mbuf list, NB_SEGS is decremented accordingly,
> and no replacement buffer is allocated.
> A separate data_seg_len accumulator tracks the total length
> of delivered segments only.
> The packet length is adjusted to reflect only the data
> actually delivered to the application.
> 
> Signed-off-by: Gregory Etelson <[email protected]>
> Signed-off-by: Thomas Monjalon <[email protected]>
> ---

AI review with Opus 4.8 and High setting found one issue:

Patch 6: net/mlx5: support selective Rx

Error: NULL pointer dereference when the first configured Rx segment is a
discard segment (mp == NULL).

In mlx5_rx_burst() the head mbuf and the chain tail are tracked like this:

if (pkt) {
    if (rep->pool)
        NEXT(tail) = rep;
    else
        --NB_SEGS(pkt);
}
...
if (seg->pool) {
    tail = seg;
    ...
}

tail is only ever assigned inside "if (seg->pool)", and pkt is set to the
first processed segment unconditionally (pkt = seg in the !pkt block, no
pool guard). So if the first segment of a packet is a discard segment:

    pkt becomes the null_mbuf (pool == NULL), tail stays NULL;
    on the next (real) segment, rep->pool is set, so NEXT(tail) = rep executes 
with tail == NULL -> write through NULL.

Even without the crash, returning the pool-less null_mbuf as the packet
head is wrong: the application later frees it back to a NULL pool.

This is reachable, not theoretical. testpmd (patch 3) inserts a leading
mp==NULL segment whenever the first offset is > 0 (seg_offset > next_offset
with next_offset starting at 0), ethdev check_split (patch 2) now permits a
leading NULL mp, and mlx5_rxq_new() accepts it (first_mp is just the first
non-NULL pool; there is no requirement that rxseg[0].mp != NULL). The DTS
cases selective_rx_payload_only (rxoffs=[34]) and selective_rx_two_segments
(rxoffs=[14,...]) in patch 10 configure exactly this layout, and
mlx5_selective_rx_enabled() forces the scalar mlx5_rx_burst path, so the
buggy path is the one that runs.

Trace for rxoffs=34 / rxpkts=payload (segments: discard[0,34) real[34,290)
discard[290,max)):

iter0 (discard head): pkt == NULL, seg->pool == NULL -> pkt = null_mbuf,
tail not set; len(290) > DATA_LEN(34) -> ++NB_SEGS, continue.
iter1 (real seg):      pkt set, rep->pool != NULL -> NEXT(tail==NULL)=rep.

Suggested fix: a discard segment must not become the packet head/tail.
Either reject rxseg[0].mp == NULL in mlx5_rxq_new() (cleanest, matches the
"deliver last N bytes" case being unsupported here), or make the data path
skip leading discard segments without assigning them to pkt and only set
pkt/tail on the first segment with a pool. If leading discard is intended
to be supported, the head selection and NEXT(tail) linking both need to
account for tail == NULL.

The same head/tail assumption also means a packet that falls entirely
within a leading discard segment would be returned with a NULL-pool head;
fixing the above covers that too.

Reply via email to