02/06/2026 15:53, Stephen Hemminger:
> On Fri, 29 May 2026 15:34:00 +0200
> Thomas Monjalon <[email protected]> wrote:
> 
> > From: Gregory Etelson <[email protected]>
> > 
> > Selective Rx may save some PCI bandwidth.
> > Implement selective Rx in the (quite slow) scalar SPRQ Rx path
> > mlx5_rx_burst() where the performance impact
> > of the added condition branches is acceptable.
> > Other Rx functions do not support this feature.
> > When using selective Rx, mlx5_rx_burst will be selected.
> > 
> > A null Memory Region (MR) is always allocated
> > at shared device context initialization.
> > The selective Rx capability is not advertised
> > if this special MR allocation fails.
> > 
> > For each Rx segment configured with a NULL mempool,
> > a "null mbuf" is created.
> > It is a fake mbuf allocated outside any mempool,
> > used as a placeholder in the Rx ring.
> > The null MR lkey is used in the WQE for these segments
> > so the NIC writes received data to a discard buffer.
> > The mbuf data room size is resolved from the first segment having a pool.
> > For null segments, the buffer length is from the last seen pool,
> > so that the WQE stride size remains consistent.
> > 
> > In mlx5_rx_burst, discarded segments are not chained
> > into the packet mbuf list, NB_SEGS is decremented accordingly,
> > and no replacement buffer is allocated.
> > A separate data_seg_len accumulator tracks the total length
> > of delivered segments only.
> > The packet length is adjusted to reflect only the data
> > actually delivered to the application.
> > 
> > Signed-off-by: Gregory Etelson <[email protected]>
> > Signed-off-by: Thomas Monjalon <[email protected]>
> > ---
> 
> AI review with Opus 4.8 and High setting found one issue:
> 
> Patch 6: net/mlx5: support selective Rx
> 
> Error: NULL pointer dereference when the first configured Rx segment is a
> discard segment (mp == NULL).
> 
> In mlx5_rx_burst() the head mbuf and the chain tail are tracked like this:
> 
> if (pkt) {
>     if (rep->pool)
>         NEXT(tail) = rep;
>     else
>         --NB_SEGS(pkt);
> }
> ...
> if (seg->pool) {
>     tail = seg;
>     ...
> }
> 
> tail is only ever assigned inside "if (seg->pool)", and pkt is set to the
> first processed segment unconditionally (pkt = seg in the !pkt block, no
> pool guard). So if the first segment of a packet is a discard segment:
> 
>     pkt becomes the null_mbuf (pool == NULL), tail stays NULL;
>     on the next (real) segment, rep->pool is set, so NEXT(tail) = rep 
> executes with tail == NULL -> write through NULL.
> 
> Even without the crash, returning the pool-less null_mbuf as the packet
> head is wrong: the application later frees it back to a NULL pool.
> 
> This is reachable, not theoretical. testpmd (patch 3) inserts a leading
> mp==NULL segment whenever the first offset is > 0 (seg_offset > next_offset
> with next_offset starting at 0), ethdev check_split (patch 2) now permits a
> leading NULL mp, and mlx5_rxq_new() accepts it (first_mp is just the first
> non-NULL pool; there is no requirement that rxseg[0].mp != NULL). The DTS
> cases selective_rx_payload_only (rxoffs=[34]) and selective_rx_two_segments
> (rxoffs=[14,...]) in patch 10 configure exactly this layout, and
> mlx5_selective_rx_enabled() forces the scalar mlx5_rx_burst path, so the
> buggy path is the one that runs.
> 
> Trace for rxoffs=34 / rxpkts=payload (segments: discard[0,34) real[34,290)
> discard[290,max)):
> 
> iter0 (discard head): pkt == NULL, seg->pool == NULL -> pkt = null_mbuf,
> tail not set; len(290) > DATA_LEN(34) -> ++NB_SEGS, continue.
> iter1 (real seg):      pkt set, rep->pool != NULL -> NEXT(tail==NULL)=rep.
> 
> Suggested fix: a discard segment must not become the packet head/tail.
> Either reject rxseg[0].mp == NULL in mlx5_rxq_new() (cleanest, matches the
> "deliver last N bytes" case being unsupported here), or make the data path
> skip leading discard segments without assigning them to pkt and only set
> pkt/tail on the first segment with a pool. If leading discard is intended
> to be supported, the head selection and NEXT(tail) linking both need to
> account for tail == NULL.
> 
> The same head/tail assumption also means a packet that falls entirely
> within a leading discard segment would be returned with a NULL-pool head;
> fixing the above covers that too.

Yes sorry for missing that.
I fix it by assigning packet after the first real segment.
I take this opportunity to add few comments on the branching conditions.


Reply via email to