On Tue, 27 Jan 2026 10:13:54 -0800 [email protected] wrote: > From: Scott Mitchell <[email protected]> > > - Add rte_prefetch0() to prefetch next frame/mbuf while processing > current packet, reducing cache miss latency
Makes sense, if you really want to dive deeper there are more unrolled loops patterns possible; there was a multi-step unrolled loop pattern that fd.io does. The reason is that the first pre-fetch is usually useless and doesn't help but skipping ahead farther helps. > - Replace memcpy() with rte_memcpy() for optimized copy operations There is no good reason that rte_memcpy() should be faster than memcpy(). There were some cases observed with virtio but my hunch is that this is because the two routines are making different alignment assumptions. > - Use rte_pktmbuf_free_bulk() in TX path instead of individual > rte_pktmbuf_free() calls for better batch efficiency Makes sense. > - Add unlikely() hints for error paths (oversized packets, VLAN > insertion failures, sendto errors) to optimize branch prediction Also makes sense. > - Remove unnecessary early nb_pkts == 0 when loop handles this > and app may never call with 0 frames. Yes calling with nb_pkts == 0 on tx/rx burst only needs to work does not need short circuit. > Signed-off-by: Scott Mitchell <[email protected]>

