https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120428
--- Comment #16 from Hongtao Liu <liuhongt at gcc dot gnu.org> --- (In reply to Jonathan Wakely from comment #15) > (In reply to Hongtao Liu from comment #13) > > The inner loop is not completely unrolled since std::copy is lowered to > > __builtin_memmove instead of __builtin_memcpy > > std::copy allows the end of the output range to overlap with the start of > the input range, so memcpy is not suitable. > > It was always using memmove, even before the r15-4475-g7ed561f63e7955 > changes to the library headers. To clarify, I'm not saying that the source should be memcpy, I'm saying that the compiler should be able to optimize it to memcpy, like when -mprefer-vector-width=256. For the source code, looks like "buffer" is local array and should not overlap with either *value_chunk" or "values"?