https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121853

--- Comment #5 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The releases/gcc-15 branch has been updated by Tamar Christina
<[email protected]>:

https://gcc.gnu.org/g:a4ee0e3597f953bdb3880f9fd90166cd83edff9f

commit r15-10525-ga4ee0e3597f953bdb3880f9fd90166cd83edff9f
Author: Tamar Christina <[email protected]>
Date:   Fri Oct 31 16:07:24 2025 +0000

    AArch64: support bf16 to sf extensions [PR121853]

    It looks like during the upstreaming of BF16 we didn't implement the extend
    optab for it.

    As a result we go through soft-float emulation which results in massive
    performance drop in projects using BF16.

    As an example, for

    float convert(__bf16 value) {
        return (float)value;
    }

    we generate:

    convert(__bf16):
            stp     x29, x30, [sp, -16]!
            mov     x29, sp
            bl      __extendbfsf2
            ldp     x29, x30, [sp], 16
            ret

    and after this patch

    convert:
            movi    v31.4s, 0
            ext     v0.16b, v31.16b, v0.16b, #14
            ret

    We generate an ext with movi because this has same latency as a shift
however
    it has twice the throughput.  The zero vector is zero latency as such in
real
    workloads this codegen is much better than using shifts.

    As a reminder, BF16 -> FP32 is just shifting left 16 bits.

    The expand pattern has to rely on generating multiple subregs due to a
    restriction that subregs can't chang floating point size and type at the
same
    time.

    I've tried alternative approaches like using the EXT as SF mode, but the
    paradoxical subreg of BF -> SF isn't allowed and using an extend doesn't
work
    because extend is what we're defining.

    gcc/ChangeLog:

            PR target/121853
            * config/aarch64/aarch64-simd.md (extendbfsf2): New.

    gcc/testsuite/ChangeLog:

            PR target/121853
            * gcc.target/aarch64/pr121853_1.c: New test.
            * gcc.target/aarch64/pr121853_2.c: New test.

    (cherry picked from commit 58ee2079230e0340187a5a9990891fb174034301)

Reply via email to