| Issue |
183730
|
| Summary |
vectorizer fails to vectorize small fixed-size loops getting better with bigger vector sizes
|
| Labels |
new issue
|
| Assignees |
|
| Reporter |
Disservin
|
for smaller sizes i.e. 4, especially 32 clang emits very bloated avx512 code compared to gcc's output
this gets better at bigger array sizes and later matches gcc's codegen
https://godbolt.org/z/Pn4Ycd5aK
```cpp
template<std::size_t... I>
constexpr std::array<int32_t, sizeof...(I)>
make_base(std::index_sequence<I...>)
{
return { (int32_t(I) * int32_t(I))... };
}
template<int SIZE>
std::array<int32_t, SIZE> pow2_and_vector(const std::array<int32_t, SIZE>& x, const std::array<int32_t, SIZE>& y)
{
constexpr auto base = make_base(std::make_index_sequence<SIZE>{});
std::array<int32_t, SIZE> yy{};
for (int i = 0; i < SIZE; ++i) {
yy[i] = base[i] << y[i];
}
std::array<int32_t, SIZE> d{};
for (int i = 0; i < SIZE; ++i) {
d[i] = yy[i] & -yy[i];
}
return d;
}
```
_______________________________________________
llvm-bugs mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs