https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121688

            Bug ID: 121688
           Summary: F16C/AVX512F cvtph2ps and cvtps2ph not used on
                    __builtin_convertvector
           Product: gcc
           Version: 16.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: mkretz at gcc dot gnu.org
  Target Milestone: ---
            Target: x86-64-*-*, i686-*-*

Test case (https://compiler-explorer.com/z/Yz7hvxGd1):

using v4hf [[gnu::vector_size(8)]] = _Float16;
using v8hf [[gnu::vector_size(16)]] = _Float16;
using v16hf [[gnu::vector_size(32)]] = _Float16;

using v4sf [[gnu::vector_size(16)]] = float;
using v8sf [[gnu::vector_size(32)]] = float;
using v16sf [[gnu::vector_size(64)]] = float;

v4sf cvtph2ps(v4hf x)
{ return __builtin_convertvector(x, v4sf); }

v4hf cvtps2ph(v4sf x)
{ return __builtin_convertvector(x, v4hf); }

v8sf cvtph2ps(v8hf x)
{ return __builtin_convertvector(x, v8sf); }

v8hf cvtps2ph(v8sf x)
{ return __builtin_convertvector(x, v8hf); }

v16sf cvtph2ps(v16hf x)
{ return __builtin_convertvector(x, v16sf); }

v16hf cvtps2ph(v16sf x)
{ return __builtin_convertvector(x, v16hf); }


Compile with -O2 -march=x86-64-v4 (or -v3).

All of these functions should get translated to a single cvtph2ps/cvtps2ph
instruction + ret. Similar to when compiling with '-mavx512fp16', except that
the 'x' from the instruction needs to be removed 😉.

(This seems to be a prerequisite for PR121587.)

Reply via email to