Issue |
82198
|
Summary |
Incorrect codegen for Zen4, possibly related to zmm register allocation
|
Labels |
new issue
|
Assignees |
|
Reporter |
Eisenwave
|
https://godbolt.org/z/8q7v4488z
This isn't exactly a minimal example but:
```cpp
using u64 = unsigned long long;
using u32 = unsigned;
using big = _BitInt(4096);
u32 rem_fast(big x, u32 y) {
//
constexpr int size = sizeof(x) / 4;
u32 digits[size];
__builtin_memcpy(digits, &x, sizeof(digits));
u32 rem = 0;
for (int i = 0; i < size; ++i) {
u64 temp = u64(rem) << 32 | digits[size - i - 1];
rem = temp % y;
}
return rem;
}
u32 rem_slow(big x, u32 y) {
return x % y;
}
int main() {
const u32 divisor = 77777;
big random = -1;
random *= 12345;
random += 1234567;
random *= 23894238392;
random += 333333333333333;
if (rem_slow(random, divisor) != rem_fast(random, divisor)) {
__builtin_trap();
}
}
```
`rem_fast` and `rem_slow` are equivalent and this program is not meant to crash.
It runs just fine with `-march=znver3`, but `-march=znver4` crashes (because of `__builtin_trap()`).
There's clearly something wrong here and I suspect it's related to `zmm` register allocation, since that's the striking difference between Zen 3 and Zen 4 codegen here.
_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs