[Bug target/107432] __builtin_convertvector generates inefficient code

g.peterhoff--- via Gcc-bugs Thu, 27 Oct 2022 09:14:49 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107432


--- Comment #2 from g.peterh...@t-online.de ---
Another example. I want to convert an array<Bool> to array<Float64>.
There are basically 3 options:
- Copy
- Test (b2f64_default)
- optimized version (b2f64_manually)

gcc12.2 + gcctrunc
convertSIZE_copy only generates scalar code (_mm_cvtsi64_sd)
convertSIZE_default always generates conditional jumps

convertSIZE_manually
gcctrunc always generates branch-free scalar code
gcc12.2
convert1024_manually generates vector code, but does not use HW conversion
int8->int64 (_mm(256)_cvtepi8_epi64) and converts int8->int16->int32->int64
manually
convert8_manually generates branch-free scalar code
convert4_manually generates vector code and uses HW conversion int8->int64


NONE of these conversions are transformed/optimized to the extent that always
- all available intrinsics are used
- no "normal" registers are used
- branch-free code is generated

https://godbolt.org/z/f74vK79of

thx
Gero

[Bug target/107432] __builtin_convertvector generates inefficient code

Reply via email to