https://gcc.gnu.org/bugzilla/show_bug.cgi?id=125550

--- Comment #2 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Artemiy Volkov <[email protected]>:

https://gcc.gnu.org/g:8076bc965f8aebafcb18739492432a19ee62a17c

commit r17-1448-g8076bc965f8aebafcb18739492432a19ee62a17c
Author: Artemiy Volkov <[email protected]>
Date:   Tue Jun 2 08:53:40 2026 +0000

    aarch64: use ZIP1 instead of UZP1 for concatenation [PR125550]

    This patch addresses the issue in PR125550, where two float16 values are
    being concatenated using UZP1, i.e., this code:

    svfloat16_t foo (float x0, float x1)
    {
      return svdupq_n_f16 (x0, x1, x0, x1, x0, x1, x0, x1);
    }

    is being compiled into:

            fcvt    h0, s0
            fcvt    h1, s1
            uzp1    v0.4h, v0.4h, v1.4h
            mov     z0.s, s0
            ret

    causing the duplication of a 2-element vector ((float16) x0, 0) into z0.

    This is a copy-paste error from the original combine_internal patterns,
    where UZP1 always operates on vectors of 2 elements, in which circumstance
    it is equivalent to ZIP1.  For smaller element sizes (and thus higher
    element counts) only ZIP1 is correct.

    The fix is to emit ZIP1 when concatenating values on vector registers.
    For consistency, I've changed the original combine_internal patterns as
    well as the ones added in r17-898-g920eeb67a3537b.  Since this latter
    change has nothing to do with the PR, it could have been better to split
    the patch in two; I'd be happy to do that if necessary.

    Both aforementioned changes required adjusting existing AdvSIMD/SVE
    vec_init-related testcases; I've added pr125550.c from the PR on top of
    that as well.

    Bootstrapped and regtested on aarch64-linux-gnu.

            PR target/125550

    gcc/ChangeLog:

            * config/aarch64/aarch64-simd.md
            (*aarch64_combine_internal<mode>): Use zip1 instead of uzp1
            to concatenate values residing in SIMD registers.
            (*aarch64_combine_internal_be<mode>): Likewise.

    gcc/testsuite/ChangeLog:

            * gcc.target/aarch64/ldp_stp_16.c: Adjust testcases.
            * gcc.target/aarch64/pr109072_1.c: Likewise.
            * gcc.target/aarch64/simd/mf8_data_1.c: Likewise.
            * gcc.target/aarch64/sve/vec_init_5.c: Likewise.
            * gcc.target/aarch64/vec-init-14.c: Likewise.
            * gcc.target/aarch64/vec-init-23.c: Likewise.
            * gcc.target/aarch64/vec-init-9.c: Likewise.
            * gcc.target/aarch64/sve/pr125550.c: New test.

Reply via email to