https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119442

--- Comment #3 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Kyrylo Tkachov <ktkac...@gcc.gnu.org>:

https://gcc.gnu.org/g:70391e3958db791edea4e877636592de47a785e7

commit r15-9062-g70391e3958db791edea4e877636592de47a785e7
Author: Kyrylo Tkachov <ktkac...@nvidia.com>
Date:   Mon Mar 24 01:53:06 2025 -0700

    PR middle-end/119442: expr.cc: Fix vec_duplicate into vector boolean modes

    In this testcase GCC tries to expand a VNx4BI vector:
      vector(4) <signed-boolean:4> _40;
      _39 = (<signed-boolean:4>) _24;
      _40 = {_39, _39, _39, _39};

    This ends up in a scalarised sequence of bitfield insert operations.
    This is despite the fact that AArch64 provides a vec_duplicate pattern
    specifically for vec_duplicate into VNx4BI.

    The store_constructor code is overly conservative when trying vec_duplicate
    as it sees a requested VNx4BImode and an element mode of QImode, which I
guess
    is the storage mode of BImode objects.

    The vec_duplicate expander in aarch64-sve.md explicitly allows QImode
element
    modes so it should be safe to use it.  This patch extends that mode check
    to allow such expanders.

    The testcase is heavily auto-reduced from a real application but in itself
is
    nonsensical, but it does demonstrate the current problematic codegen.

    This the testcase goes from:
            pfalse  p15.b
            str     p15, [sp, #6, mul vl]
            mov     w0, 0
            ldr     w2, [sp, 12]
            bfi     w2, w0, 0, 4
            uxtw    x2, w2
            bfi     w2, w0, 4, 4
            uxtw    x2, w2
            bfi     w2, w0, 8, 4
            uxtw    x2, w2
            bfi     w2, w0, 12, 4
            str     w2, [sp, 12]
            ldr     p15, [sp, #6, mul vl]

    into:
            whilelo p15.s, wzr, wzr

    The whilelo could be optimised away into a pfalse of course, but the
important
    part is that the bfis are gones.

    Bootstrapped and tested on aarch64-none-linux-gnu.

    Signed-off-by: Kyrylo Tkachov <ktkac...@nvidia.com>

    gcc/

            PR middle-end/119442
            * expr.cc (store_constructor): Also allow element modes explicitly
            accepted by target vec_duplicate pattern.

    gcc/testsuite/

            PR middle-end/119442
            * gcc.target/aarch64/vls_sve_vec_dup_1.c: New test.

Reply via email to