MBUF and MLX5 maintainers,

I'm picking up an old discussion, which you might consider pursuing. Feel free 
to ignore, if you consider this discussion irrelevant or already closed and 
done with.

The Techboard has previously discussed the organization of the mbuf fields. 
Ref: http://mails.dpdk.org/archives/dev/2020-November/191859.html

It was concluded that there was no measured performance difference if the 
"pool" or "next" field was in the first cacheline, so it was decided to put the 
"pool" field in the first cacheline. And further optimizing the mbuf field 
organization could be reconsidered later.

I have been looking at it. In theory it should not be required to touch the 
"pool" field at RX. But the "next" field must be written for segmented packets.

I think you could achieve an RX performance gain in the MLX5 driver if the mbuf 
structure was changed so the "next" and "pool" fields were swapped (i.e. 
putting "next" in the first cacheline), and /drivers/net/mlx5/mlx5_rx.c line 
821 was modified to replace "rep = rte_mbuf_raw_alloc(seg->pool)" with 
something conceptually like "rep = rte_mbuf_raw_alloc(rxq->pool)". Then you 
don't have to touch the mbuf's "pool" field (residing in the second cacheline 
with this change) during RX. This way, you would only touch the mbuf's first 
cacheline during RX.

My suggested optimization might be purely theoretical: Many applications touch 
the mbuf's second cacheline shortly after RX anyway.

If you don't pursue this mbuf reorganization, the comment to the mbuf's 
cacheline1 field is incorrect and should be updated:
- /* second cache line - fields only used in slow path or on TX */
+ /* second cache line - fields mainly used in slow path or on TX */

-Morten

Reply via email to