[Bug target/110105] ARM GCC: underoptimization: expected vfma.f16, actual vcvtb-vfma.f32-vcvtb

pavel.morozkin at gmail dot com via Gcc-bugs Mon, 12 Jun 2023 11:48:27 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110105


--- Comment #4 from Pavel M <pavel.morozkin at gmail dot com> ---
To: rsand...@gcc.gnu.org

Thanks! I confused __fp16 with _Float16.

However, if __fp16 is only a “storage type”, then why this code:
__fp16 mul(__fp16 x, __fp16 y)
{
    return x * y;
}

compiled with -O3 -mfpu=fp-armv8 -march=armv8.2-a+fp16

leads to this code:
mul:
        vmul.f16        s0, s0, s1
        bx      lr

Here we see vmul.f16 instead of half->float->vmul.f32->float->half.

As a user, I expect half->float->vmul.f32->float->half (because __fp16 is only
a “storage type”).

Where is the conversions and mul.f32?

P.S. If optimizer does this, then as I remember, half->float->op->float->half
does not always produce the same result as half->op->half. The difference in
result may be +/-1 (last) bit. Any comments?

[Bug target/110105] ARM GCC: underoptimization: expected vfma.f16, actual vcvtb-vfma.f32-vcvtb

Reply via email to