Issue 82198
Summary Incorrect codegen for Zen4, possibly related to zmm register allocation
Labels new issue
Assignees
Reporter Eisenwave
    https://godbolt.org/z/8q7v4488z

This isn't exactly a minimal example but:
```cpp
using u64 = unsigned long long;
using u32 = unsigned;
using big = _BitInt(4096);

u32 rem_fast(big x, u32 y) {
    // 
    constexpr int size = sizeof(x) / 4;
    u32 digits[size];
    __builtin_memcpy(digits, &x, sizeof(digits));
    u32 rem = 0;
    for (int i = 0; i < size; ++i) {
        u64 temp = u64(rem) << 32 | digits[size - i - 1];
        rem = temp % y;
    }
    return rem;
}

u32 rem_slow(big x, u32 y) {
    return x % y;
}

int main() {
    const u32 divisor = 77777;
    big random = -1;
    random *= 12345;
    random += 1234567;
    random *= 23894238392;
    random += 333333333333333;

    if (rem_slow(random, divisor) != rem_fast(random, divisor)) {
        __builtin_trap();
    }
}
```
`rem_fast` and `rem_slow` are equivalent and this program is not meant to crash.

It runs just fine with `-march=znver3`, but `-march=znver4` crashes (because of `__builtin_trap()`).

There's clearly something wrong here and I suspect it's related to `zmm` register allocation, since that's the striking difference between Zen 3 and Zen 4 codegen here.
_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

Reply via email to