https://bugs.llvm.org/show_bug.cgi?id=50189

            Bug ID: 50189
           Summary: Missed optimization to auto-vectorize
                    __builtin_bitreverse
           Product: clang
           Version: unspecified
          Hardware: PC
                OS: Windows NT
            Status: NEW
          Severity: enhancement
          Priority: P
         Component: C++
          Assignee: [email protected]
          Reporter: [email protected]
                CC: [email protected], [email protected],
                    [email protected], [email protected],
                    [email protected]

This code:

  #include <cstdint>

  #define DEFINE_BITREVERSE_LOOP(N) \
      void bitreverse##N##_loop(int##N##_t* values, int n) { \
          for (int i = 0; i < n; i++) { \
              values[i] = __builtin_bitreverse##N(values[i]); \
          } \
      } \

  DEFINE_BITREVERSE_LOOP(8);
  DEFINE_BITREVERSE_LOOP(16);
  DEFINE_BITREVERSE_LOOP(32);
  DEFINE_BITREVERSE_LOOP(64);

When compiled with -O3 -march=armv8.2-a+fp16+rcpc+dotprod+crypto+simd+crc

Both clang-9 and clang-11 produce all kinds of code (sometimes even
auto-vectorized, for bytes), instead of just passing vector registers to the
rbit instruction.

This seems a bit ironic, given that __builtin_bitreverse was originally
invented to support arm rbit.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
_______________________________________________
llvm-bugs mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

Reply via email to