https://gcc.gnu.org/bugzilla/show_bug.cgi?id=125795

--- Comment #9 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Kyrylo Tkachov <[email protected]>:

https://gcc.gnu.org/g:39de311c74d13949feab1fc9fe45654e0219b065

commit r17-1576-g39de311c74d13949feab1fc9fe45654e0219b065
Author: Kyrylo Tkachov <[email protected]>
Date:   Mon Jun 15 04:53:34 2026 -0700

    aarch64: Fix early-ra wrong code with full-width FPR color groups
[PR125795]

    early_ra::allocate_colors marks the FPRs occupied by a color with

      m_allocated_fprs |= ((1U << color->group->size) - 1) << best;

    When a color group spans the whole register file (size == 32), as can
    happen for a heavily unrolled, vectorized loop, "1U << 32" is undefined
    and evaluates to 1 on AArch64 hosts, so the expression sets no bits at all.
    The 32 FPRs of the group are therefore not recorded as allocated.
    Subsequent colors (and broaden_colors) then reuse those registers, which
    breaks the invariant that distinct colors receive disjoint FPRs.

    In PR125795 this let the loop-invariant TBL permute index, which is live
    across the whole loop, share v28 with the LD2 tuple destinations, so the
    index was clobbered mid-loop and the loop produced wrong results.

    Fix this by using a 64-bit shift base: unsigned long long is at least
    64 bits on every host, so "1ULL << 32" is well-defined.
    best + size <= 32 is guaranteed by the candidate search, which the patch
    also asserts, so the result still fits in the 32-bit m_allocated_fprs
    When the full-width group can no longer be hidden, allocate_colors
correctly
    fails to find a register for the other color and the region is left to the
    real register allocator, matching -mearly-ra=none.

    Bootstrapped and tested on aarch64-none-linux-gnu.
    Pushing to trunk and later to the branches after testing.

    Signed-off-by: Kyrylo Tkachov <[email protected]>

    gcc/ChangeLog:

            PR target/125795
            * config/aarch64/aarch64-early-ra.cc (early_ra::allocate_colors):
            Compute the allocated-FPR mask as
            ((1ULL << color->group->size) - 1) << best.

    gcc/testsuite/ChangeLog:

            PR target/125795
            * gcc.target/aarch64/pr125795.c: New test.

Reply via email to