https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90262

--- Comment #11 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by H.J. Lu <[email protected]>:

https://gcc.gnu.org/g:b41f96465190751561f6909e858604ceab00595b

commit r16-4947-gb41f96465190751561f6909e858604ceab00595b
Author: H.J. Lu <[email protected]>
Date:   Mon Oct 20 16:14:34 2025 +0800

    x86-64: Inline memmove with overlapping unaligned loads and stores

    Inline memmove in 64-bit since there are much less registers available
    in 32-bit:

    1. Load all sources into registers and store them together to avoid
       possible address overlap between source and destination.
    2. For known size, first try to fully unroll with 8 registers.
    3. For size <= 2 * MOVE_MAX, load all sources into 2 registers first
       and then store them together.
    4. For size > 2 * MOVE_MAX and size <= 4 * MOVE_MAX, load all sources
       into 4 registers first and then store them together.
    5. For size > 4 * MOVE_MAX and size <= 8 * MOVE_MAX, load all sources
       into 8 registers first and then store them together.
    6. For size > 8 * MOVE_MAX,
       a. If address of destination > address of source, copy backward
          with a 4 * MOVE_MAX loop with unaligned loads and stores.  Load
          the first 4 * MOVE_MAX into 4 registers before the loop and
          store them after the loop to support overlapping addresses.
       b. Otherwise, copy forward with a 4 * MOVE_MAX loop with unaligned
          loads and stores.  Load the last 4 * MOVE_MAX into 4 registers
          before the loop and store them after the loop to support
          overlapping addresses.

    Verified and benchmarked memmove implementations inlined with GPR, SSE2,
    AVX2 and AVX512 using glibc memmove tests.  It is available at

    https://gitlab.com/x86-glibc/glibc/-/commits/users/hjl/test/memmove

    Their performances are comparable with optimized memmove implementations
    in glibc on Intel Core i7-1195G7.

    gcc/

            PR target/90262
            * config/i386/i386-expand.cc (ix86_expand_unroll_movmem): New.
            (ix86_expand_n_move_movmem): Likewise.
            (ix86_expand_load_movmem): Likewise.
            (ix86_expand_store_movmem): Likewise.
            (ix86_expand_n_overlapping_move_movmem): Likewise.
            (ix86_expand_less_move_movmem): Likewise.
            (ix86_expand_movmem): Likewise.
            * config/i386/i386-protos.h (ix86_expand_movmem): Likewise.
            * config/i386/i386.md (movmem<mode>): Likewise.

    gcc/testsuite/

            * gcc.target/i386/builtin-memmove-1a.c: New test.
            * gcc.target/i386/builtin-memmove-1b.c: Likewise.
            * gcc.target/i386/builtin-memmove-1c.c: Likewise.
            * gcc.target/i386/builtin-memmove-1d.c: Likewise.
            * gcc.target/i386/builtin-memmove-2a.c: Likewise.
            * gcc.target/i386/builtin-memmove-2b.c: Likewise.
            * gcc.target/i386/builtin-memmove-2c.c: Likewise.
            * gcc.target/i386/builtin-memmove-2d.c: Likewise.
            * gcc.target/i386/builtin-memmove-3a.c: Likewise.
            * gcc.target/i386/builtin-memmove-3b.c: Likewise.
            * gcc.target/i386/builtin-memmove-3c.c: Likewise.
            * gcc.target/i386/builtin-memmove-4a.c: Likewise.
            * gcc.target/i386/builtin-memmove-4b.c: Likewise.
            * gcc.target/i386/builtin-memmove-4c.c: Likewise.
            * gcc.target/i386/builtin-memmove-5a.c: Likewise.
            * gcc.target/i386/builtin-memmove-5b.c: Likewise.
            * gcc.target/i386/builtin-memmove-5c.c: Likewise.
            * gcc.target/i386/builtin-memmove-6.c: Likewise.
            * gcc.target/i386/builtin-memmove-7.c: Likewise.
            * gcc.target/i386/builtin-memmove-8.c: Likewise.
            * gcc.target/i386/builtin-memmove-9.c: Likewise.
            * gcc.target/i386/builtin-memmove-10.c: Likewise.
            * gcc.target/i386/builtin-memmove-11a.c: Likewise.
            * gcc.target/i386/builtin-memmove-11b.c: Likewise.
            * gcc.target/i386/builtin-memmove-11c.c: Likewise.
            * gcc.target/i386/builtin-memmove-12.c: Likewise.
            * gcc.target/i386/builtin-memmove-13.c: Likewise.
            * gcc.target/i386/builtin-memmove-14.c: Likewise.
            * gcc.target/i386/builtin-memmove-15.c: Likewise.

    Signed-off-by: H.J. Lu <[email protected]>

Reply via email to