> From: Stephen Hemminger [mailto:step...@networkplumber.org] > Sent: Saturday, 2 March 2024 17.38 > > On Sat, 2 Mar 2024 14:05:45 +0100 > Morten Brørup <m...@smartsharesystems.com> wrote: > > > > > > > > My experience with replacing rte_memcpy() with memcpy() (or vice > > > versa) > > > > is mixed. > > > > > > > > I've also tried just dropping the DPDK-custom memcpy() > implementation > > > > altogether, and that caused a performance drop (in a particular > app, > > > on > > > > a particular compiler and CPU). > > > > I guess the compilers are just not where we want them to be yet. > > > > I don't mind generally replacing rte_memcpy() with memcpy() in the > control plane. > > But we should use whatever is more efficient in the data plane. > > > > We must also keep in mind that DPDK supports old distros with old > compilers. We should not remove a superfluous hand crafted optimization > if a supported old compiler hasn't caught up with it yet, i.e. if it > isn't superfluous on some of the old compilers supported by DPDK. > > When I scanned the result. > 1. Most copies were small (like Ether address or IPv6 address) > and compiler > inlining should beat a function call every time.
Please note that rte_memcpy() is inline, so no function call is involved. > 2. Larger structure copies were in control path. Yep, I saw the same two things when scanning v1 of the series before acking it. If we didn't overlook any fast path copies, this series is a good clean-up I must admit that I assume that any compiler's built-in memcpy() is able to efficiently copy small structures of build time constant size. Assumptions are the mother of all FU's, but being wrong on this would be a very big surprise to me.