https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84719
--- Comment #2 from gpnuma at centaurean dot com --- (In reply to Andrew Pinski from comment #1) > Does -mcpu=native improve it? > Also is GCC calling memcpy instead of doing an inline version? No -march=native does not make any difference. And no, gcc is not calling memcpy as when I replace __builtin_memcpy by memcpy in the above code it is somewhat slower, but the timing is the same as clang/memcpy this time. It's just when comparing gcc/__builtin_memcpy and clang/__builtin_memcpy that the resulting code exhibits considerable performance differences in favor of clang.