https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111354
--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> --- First off the performance is difference is die to micro-arch issues with unaligned stores of 256 bits. Also iirc rte_mov128blocks is tuned at copying blocks which are aligned at least to 32 bytes wide. But you are better asking the dpdk forum why they don't just use memcpy here.