> -----Original Message-----
> From: Stephen Hemminger <[email protected]>
> Sent: 16 June 2026 19:41
> To: Maxime Leroy <[email protected]>
> Cc: [email protected]; Hemant Agrawal <[email protected]>; Sachin
> Saxena <[email protected]>
> Subject: Re: [PATCH v2 6/6] net/dpaa2: drop the fake software VLAN strip
> offload
> Importance: High
> 
> On Tue, 16 Jun 2026 12:27:26 +0200
> Maxime Leroy <[email protected]> wrote:
> 
> > RTE_ETH_RX_OFFLOAD_VLAN_STRIP is advertised, but no hardware VLAN
> > strip backs it: when enabled, the Rx burst calls rte_vlan_strip() on
> > every frame, a software op masquerading as a hardware offload.
> >
> > It saves a forwarding application nothing: the datapath reads the L2
> > header anyway to classify or strip. The offload does not remove that
> > read, it relocates it into the driver Rx burst, where it is far more
> > expensive.
> >
> > The cost is a matter of timing. rte_vlan_strip() reaches the L2 header
> > through rte_pktmbuf_mtod(), which dereferences mbuf->buf_addr. On a
> > freshly recycled buffer that mbuf cacheline is cold. eth_fd_to_mbuf()
> > has just written other fields of it (data_off, ol_flags), but buf_addr
> > is a persistent field it does not rewrite. A write does not stall: it
> > posts to the store buffer while the line fills in the background, and
> > the rewritten fields are forwarded straight from there. buf_addr has
> > nothing to forward, so it must be read from the line, whose fill is
> > still in flight, and the read stalls. The ethertype read that follows,
> > on the cold payload line, stalls again. Read later by the application,
> > when the fill has completed, the same read hits. The offload just
> > performs it at the worst possible moment.
> >
> > Measured on a single-core port-to-port forwarding test over two 10G
> > ports (one core at 2 GHz, 64-byte untagged frames):
> >
> >   - throughput 4.22 -> 5.00 Mpps (+18 percent)
> >   - IPC 0.93 -> 1.25: the cost was memory stall, not compute
> >   - L3/DRAM-bound L2 refills 319M -> 200M over 10s (-37 percent)
> >
> > perf confirms it: with the offload, the buf_addr load (the cold mbuf
> > field) and the payload load account for about 84 percent of the Rx
> > burst's L2 refills; removing it, those vanish and only the inherent
> > DQRR dequeue misses remain.
> >
> > Stop advertising VLAN_STRIP and remove the rte_vlan_strip() calls from
> > every Rx path. This is a behavioural change: the tag is left in the
> > frame, so an application must strip it itself, on the L2 header it
> > already reads.
> >
> > Signed-off-by: Maxime Leroy <[email protected]>
> 
> Acked-by: Stephen Hemminger <[email protected]>
Acked-by: Hemant Agrawal <[email protected]>

Reply via email to