On Sat, Dec 23, 2023 at 3:58 AM Alexander Lobakin <[email protected]> wrote: > > Currently, idpf uses the following model for the header buffers: > > * buffers are allocated via dma_alloc_coherent(); > * when receiving, napi_alloc_skb() is called and then the header is > copied to the newly allocated linear part. > > This is far from optimal as DMA coherent zone is slow on many systems > and memcpy() neutralizes the idea and benefits of the header split. > Instead, use libie to create page_pools for the header buffers, allocate > them dynamically and then build an skb via napi_build_skb() around them > with no memory copy. With one exception... > When you enable header split, you except you'll always have a separate > header buffer, so that you could reserve headroom and tailroom only > there and then use full buffers for the data. For example, this is how > TCP zerocopy works -- you have to have the payload aligned to PAGE_SIZE. > The current hardware running idpf does *not* guarantee that you'll > always have headers placed separately. For example, on my setup, even > ICMP packets are written as one piece to the data buffers. You can't > build a valid skb around a data buffer in this case. > To not complicate things and not lose TCP zerocopy etc., when such thing > happens, use the empty header buffer and pull either full frame (if it's > short) or the Ethernet header there and build an skb around it. GRO > layer will pull more from the data buffer later. This W/A will hopefully > be removed one day.
We definitely want performance numbers here, for systems that truly matter. We spent a lot of time trying to make idpf slightly better than it was, we do not want regressions. Thank you.
