https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114995

--- Comment #8 from Pratik Chowdhury <pratikc at live dot co.uk> ---
> if you just try to compare __builtin_assume_aligned (x, 32) == x, it will 
> just fold as always true

Aah. Dead code elimination.

> CCing Aldy/Andrew for whether prange can or could be taught to handle the 
> assume cases with uintptr_t and bitwise and + comparison.

Yeah this could be very helpful in cases such as this.

@Jakub @Andrew I think [this](https://gcc.godbolt.org/z/MEre8hr71) also has
scope for taking advantage of the same.

```cpp
void MulAddLoopWorksWithBuitInUnreachableAndConst(const float* const __restrict
mul_array,
                     const float* const __restrict add_array,
                     const ::std::size_t size, float* const __restrict x_array)
{
    if((reinterpret_cast<::std::uintptr_t>(mul_array) & (32-1)) != 0)
    {
        __builtin_unreachable();
    }
    if((reinterpret_cast<::std::uintptr_t>(add_array) & (32-1)) != 0)
    {
        __builtin_unreachable();
    }
    if((reinterpret_cast<::std::uintptr_t>(x_array) & (32-1)) != 0)
    {
        __builtin_unreachable();
    }
    if ((size & (32 - 1)) != 0) __builtin_unreachable();
    for (::std::size_t i = 0; i != size; i++) [[likely]] {
        const auto mul = *(mul_array + i);
        const auto add = *(add_array + i);
        x_array[i] = x_array[i] * mul + add;
        // x_array[i] *= mul;
        // x_array[i] += add;
    }
}
```

Here we are working under the assumption that the memory addresses themselves
are multiples of 32 if aligned for AVX2.

Clang seems to be able to take advantage of the same here.

If the __builtin_assume_aligned is kinda not supported due to dead code
elimination, then this looks like a nice enough alternative.

It also retains const correctness for me.

Reply via email to