Issue 183730
Summary vectorizer fails to vectorize small fixed-size loops getting better with bigger vector sizes
Labels new issue
Assignees
Reporter Disservin
    for smaller sizes i.e. 4, especially 32 clang emits very bloated avx512 code compared to gcc's output
this gets better at bigger array sizes and later matches gcc's codegen

https://godbolt.org/z/Pn4Ycd5aK

```cpp
template<std::size_t... I>
constexpr std::array<int32_t, sizeof...(I)>
make_base(std::index_sequence<I...>)
{
    return { (int32_t(I) * int32_t(I))... };
}

template<int SIZE>
std::array<int32_t, SIZE> pow2_and_vector(const std::array<int32_t, SIZE>& x, const std::array<int32_t, SIZE>& y)
{
    constexpr auto base = make_base(std::make_index_sequence<SIZE>{});

    std::array<int32_t, SIZE> yy{};
    for (int i = 0; i < SIZE; ++i) {
        yy[i] = base[i] << y[i];
    }

    std::array<int32_t, SIZE> d{};
    for (int i = 0; i < SIZE; ++i) {
        d[i] = yy[i] & -yy[i];
    }

    return d;
}
```
_______________________________________________
llvm-bugs mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

Reply via email to