| Issue |
164802
|
| Summary |
[X86] Miscompile when using `-ftrapping-math`
|
| Labels |
backend:X86,
miscompilation,
floating-point
|
| Assignees |
|
| Reporter |
abhishek-kaushik22
|
The following C++ code when compiled with `-ftrapping-math` throws a fp-exception but it runs fine without the flag.
```cpp
#include <immintrin.h>
#include <cstdint>
#include <iostream>
#include <cfenv>
__attribute__((noinline)) void masked_div_store(double* a, double* b, uint8_t mask) {
// Convert i8 mask to __mmask8
__mmask8 k = static_cast<__mmask8>(mask);
// Masked load from a and b
__m512d va = _mm512_maskz_loadu_pd(k, a); // zero-masked load
__m512d vb = _mm512_maskz_loadu_pd(k, b); // zero-masked load
// Masked divide: va = va / vb
__m512d result = _mm512_mask_div_pd(_mm512_setzero_pd(), k, va, vb);
// Masked store back to a
_mm512_mask_storeu_pd(a, k, result);
}
int main() {
const auto res = feenableexcept(FE_DIVBYZERO | FE_INVALID | FE_OVERFLOW);
double a[4] = {8.0, 16.0, 24.0, 32.0};
double b[4] = {2.0, 4.0, 6.0, 8.0};
uint8_t mask = 0xF; // binary: 00001111 — enables lanes 0 to 3
masked_div_store(a, b, mask);
std::cout << "Result in a after masked division:\n";
for (int i = 0; i < 4; ++i) {
std::cout << "a[" << i << "] = " << a[i] << "\n";
}
return 0;
}
```
```bash
bash$ clang++ -O3 -mavx512f test2.cpp -o no_trap.exe -fuse-ld=lld
bash$ ./no_trap.exe
Result in a after masked division:
a[0] = 4
a[1] = 4
a[2] = 4
a[3] = 4
bash$ clang++ -O3 -mavx512f -ftrapping-math test2.cpp -o trap.exe -fuse-ld=lld
bash$ ./trap.exe
Floating point exception (core dumped)
```
Without the flag llvm generates
```asm
masked_div_store(double*, double*, unsigned char):
kmovw k1, edx
vmovupd zmm0 {k1} {z}, zmmword ptr [rdi]
vdivpd zmm0 {k1} {z}, zmm0, zmmword ptr [rsi]
vmovupd zmmword ptr [rdi] {k1}, zmm0
vzeroupper
ret
```
but with the flag it generates
```asm
masked_div_store(double*, double*, unsigned char):
kmovw k1, edx
vmovupd zmm0 {k1} {z}, zmmword ptr [rdi]
vmovupd zmm1 {k1} {z}, zmmword ptr [rsi]
vdivpd zmm0, zmm0, zmm1
vmovapd zmm0 {k1} {z}, zmm0
vmovupd zmmword ptr [rdi] {k1}, zmm0
vzeroupper
ret
```
The problem here is that we do a full-width division instead of a masked one causing the exception.
When the flag is specified the division instruction in LLVM IR is
```llvm
%div.i = tail call noundef <8 x double> @llvm.experimental.constrained.fdiv.v8f64(<8 x double> %1, <8 x double> %2, metadata !"round.tonearest", metadata !"fpexcept.strict") #9
```
which is represented as a `strict_fdiv` in DAG and there is no pattern to select a masked variant with strict_fp opcodes (I did find this commit https://github.com/llvm/llvm-project/commit/dbcc1392b3807d7ddcb000741d2ffb276d90d36b that removed them)
Godbolt: https://godbolt.org/z/dMr83s35E
_______________________________________________
llvm-bugs mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs