https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120456

            Bug ID: 120456
           Summary: __builtin_shuffle produces unnecessary vperm2i128
           Product: gcc
           Version: 14.2.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c++
          Assignee: unassigned at gcc dot gnu.org
          Reporter: adamant.pwn at gmail dot com
  Target Milestone: ---

Consider the following snippet:

auto test1(uint32_t bits) {
    auto bytes = u8x32(u32x8() + bits);
    u8x32 shuffler = {
        0, 0, 0, 0, 0, 0, 0, 0,
        1, 1, 1, 1, 1, 1, 1, 1,
        2, 2, 2, 2, 2, 2, 2, 2,
        3, 3, 3, 3, 3, 3, 3, 3
    };
    auto shuffle = __builtin_shuffle(bytes, shuffler);
    return shuffle;
}

It produces an unnecessary vperm2i128 command in the output. Same if I try to
use __builtin_shufflevector. Also if I try to change one of the bytes to the
second half, e.g. 3 to 31, it produces an unnecessary vpermq instead of
vperm2i128.

See https://godbolt.org/z/eEnd673e6 for details, and comparison with manual
implementation of the same function with intrinsics.

Reply via email to