On Fri, Nov 21, 2025 at 08:57:30AM -0800, Stephen Hemminger wrote: > On Fri, 21 Nov 2025 10:35:35 +0000 > Morten Brørup <[email protected]> wrote: > > > The implementation for copying up to 64 bytes does not depend on address > > alignment with the size of the CPU's vector registers, so the code > > handling this was moved from the various implementations to the common > > function. > > > > Furthermore, the function for copying less than 16 bytes was replaced with > > a smarter implementation using fewer branches and potentially fewer > > load/store operations. > > This function was also extended to handle copying of up to 16 bytes, > > instead of up to 15 bytes. This small extension reduces the code path for > > copying two pointers. > > > > These changes provide two benefits: > > 1. The memory footprint of the copy function is reduced. > > Previously there were two instances of the compiled code to copy up to 64 > > bytes, one in the "aligned" code path, and one in the "generic" code path. > > Now there is only one instance, in the "common" code path. > > 2. The performance for copying up to 64 bytes is improved. > > The memcpy performance test shows cache-to-cache copying of up to 32 bytes > > now typically only takes 2 cycles (4 cycles for 64 bytes) versus > > ca. 6.5 cycles before this patch. > > > > And finally, the missing implementation of rte_mov48() was added. > > > > Signed-off-by: Morten Brørup <[email protected]> > > As I have said before would rather that DPDK move away from having its > own specialized memcpy. How is this compared to stock inline gcc? > The main motivation is that the glibc/gcc team does more testing across > multiple architectures and has a community with more expertise on CPU > special cases.
I would tend to agree. Even if we get rte_memcpy a few cycles faster, I suspect many apps wouldn't notice the difference. However, I understand that the virtio/vhost libraries gain from using rte_memcpy over standard memcpy - or at least used to. Perhaps we can consider deprecating rte_memcpy and just putting a vhost-specific memcpy in that library? /Bruce

