https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110105
--- Comment #4 from Pavel M <pavel.morozkin at gmail dot com> --- To: rsand...@gcc.gnu.org Thanks! I confused __fp16 with _Float16. However, if __fp16 is only a “storage type”, then why this code: __fp16 mul(__fp16 x, __fp16 y) { return x * y; } compiled with -O3 -mfpu=fp-armv8 -march=armv8.2-a+fp16 leads to this code: mul: vmul.f16 s0, s0, s1 bx lr Here we see vmul.f16 instead of half->float->vmul.f32->float->half. As a user, I expect half->float->vmul.f32->float->half (because __fp16 is only a “storage type”). Where is the conversions and mul.f32? P.S. If optimizer does this, then as I remember, half->float->op->float->half does not always produce the same result as half->op->half. The difference in result may be +/-1 (last) bit. Any comments?