https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90262
--- Comment #11 from GCC Commits <cvs-commit at gcc dot gnu.org> --- The master branch has been updated by H.J. Lu <[email protected]>: https://gcc.gnu.org/g:b41f96465190751561f6909e858604ceab00595b commit r16-4947-gb41f96465190751561f6909e858604ceab00595b Author: H.J. Lu <[email protected]> Date: Mon Oct 20 16:14:34 2025 +0800 x86-64: Inline memmove with overlapping unaligned loads and stores Inline memmove in 64-bit since there are much less registers available in 32-bit: 1. Load all sources into registers and store them together to avoid possible address overlap between source and destination. 2. For known size, first try to fully unroll with 8 registers. 3. For size <= 2 * MOVE_MAX, load all sources into 2 registers first and then store them together. 4. For size > 2 * MOVE_MAX and size <= 4 * MOVE_MAX, load all sources into 4 registers first and then store them together. 5. For size > 4 * MOVE_MAX and size <= 8 * MOVE_MAX, load all sources into 8 registers first and then store them together. 6. For size > 8 * MOVE_MAX, a. If address of destination > address of source, copy backward with a 4 * MOVE_MAX loop with unaligned loads and stores. Load the first 4 * MOVE_MAX into 4 registers before the loop and store them after the loop to support overlapping addresses. b. Otherwise, copy forward with a 4 * MOVE_MAX loop with unaligned loads and stores. Load the last 4 * MOVE_MAX into 4 registers before the loop and store them after the loop to support overlapping addresses. Verified and benchmarked memmove implementations inlined with GPR, SSE2, AVX2 and AVX512 using glibc memmove tests. It is available at https://gitlab.com/x86-glibc/glibc/-/commits/users/hjl/test/memmove Their performances are comparable with optimized memmove implementations in glibc on Intel Core i7-1195G7. gcc/ PR target/90262 * config/i386/i386-expand.cc (ix86_expand_unroll_movmem): New. (ix86_expand_n_move_movmem): Likewise. (ix86_expand_load_movmem): Likewise. (ix86_expand_store_movmem): Likewise. (ix86_expand_n_overlapping_move_movmem): Likewise. (ix86_expand_less_move_movmem): Likewise. (ix86_expand_movmem): Likewise. * config/i386/i386-protos.h (ix86_expand_movmem): Likewise. * config/i386/i386.md (movmem<mode>): Likewise. gcc/testsuite/ * gcc.target/i386/builtin-memmove-1a.c: New test. * gcc.target/i386/builtin-memmove-1b.c: Likewise. * gcc.target/i386/builtin-memmove-1c.c: Likewise. * gcc.target/i386/builtin-memmove-1d.c: Likewise. * gcc.target/i386/builtin-memmove-2a.c: Likewise. * gcc.target/i386/builtin-memmove-2b.c: Likewise. * gcc.target/i386/builtin-memmove-2c.c: Likewise. * gcc.target/i386/builtin-memmove-2d.c: Likewise. * gcc.target/i386/builtin-memmove-3a.c: Likewise. * gcc.target/i386/builtin-memmove-3b.c: Likewise. * gcc.target/i386/builtin-memmove-3c.c: Likewise. * gcc.target/i386/builtin-memmove-4a.c: Likewise. * gcc.target/i386/builtin-memmove-4b.c: Likewise. * gcc.target/i386/builtin-memmove-4c.c: Likewise. * gcc.target/i386/builtin-memmove-5a.c: Likewise. * gcc.target/i386/builtin-memmove-5b.c: Likewise. * gcc.target/i386/builtin-memmove-5c.c: Likewise. * gcc.target/i386/builtin-memmove-6.c: Likewise. * gcc.target/i386/builtin-memmove-7.c: Likewise. * gcc.target/i386/builtin-memmove-8.c: Likewise. * gcc.target/i386/builtin-memmove-9.c: Likewise. * gcc.target/i386/builtin-memmove-10.c: Likewise. * gcc.target/i386/builtin-memmove-11a.c: Likewise. * gcc.target/i386/builtin-memmove-11b.c: Likewise. * gcc.target/i386/builtin-memmove-11c.c: Likewise. * gcc.target/i386/builtin-memmove-12.c: Likewise. * gcc.target/i386/builtin-memmove-13.c: Likewise. * gcc.target/i386/builtin-memmove-14.c: Likewise. * gcc.target/i386/builtin-memmove-15.c: Likewise. Signed-off-by: H.J. Lu <[email protected]>
