Issue 164802
Summary [X86] Miscompile when using `-ftrapping-math`
Labels backend:X86, miscompilation, floating-point
Assignees
Reporter abhishek-kaushik22
    The following C++ code when compiled with `-ftrapping-math` throws a fp-exception but it runs fine without the flag.
```cpp
#include <immintrin.h>
#include <cstdint>
#include <iostream>
#include <cfenv>

__attribute__((noinline)) void masked_div_store(double* a, double* b, uint8_t mask) {
    // Convert i8 mask to __mmask8
    __mmask8 k = static_cast<__mmask8>(mask);

    // Masked load from a and b
    __m512d va = _mm512_maskz_loadu_pd(k, a); // zero-masked load
    __m512d vb = _mm512_maskz_loadu_pd(k, b); // zero-masked load

    // Masked divide: va = va / vb
    __m512d result = _mm512_mask_div_pd(_mm512_setzero_pd(), k, va, vb);

    // Masked store back to a
    _mm512_mask_storeu_pd(a, k, result);
}

int main() {
    const auto res = feenableexcept(FE_DIVBYZERO | FE_INVALID | FE_OVERFLOW);
    double a[4] = {8.0, 16.0, 24.0, 32.0};
    double b[4] = {2.0, 4.0, 6.0, 8.0};

    uint8_t mask = 0xF; // binary: 00001111 — enables lanes 0 to 3

    masked_div_store(a, b, mask);

    std::cout << "Result in a after masked division:\n";
    for (int i = 0; i < 4; ++i) {
        std::cout << "a[" << i << "] = " << a[i] << "\n";
    }

    return 0;
}
```

```bash
bash$ clang++ -O3 -mavx512f test2.cpp -o no_trap.exe -fuse-ld=lld
bash$ ./no_trap.exe 
Result in a after masked division:
a[0] = 4
a[1] = 4
a[2] = 4
a[3] = 4
bash$ clang++ -O3 -mavx512f -ftrapping-math test2.cpp -o trap.exe -fuse-ld=lld
bash$ ./trap.exe
Floating point exception (core dumped)
```

Without the flag llvm generates
```asm
masked_div_store(double*, double*, unsigned char):
        kmovw   k1, edx
        vmovupd zmm0 {k1} {z}, zmmword ptr [rdi]
        vdivpd  zmm0 {k1} {z}, zmm0, zmmword ptr [rsi]
        vmovupd zmmword ptr [rdi] {k1}, zmm0
        vzeroupper
        ret
```

but with the flag it generates 
```asm
masked_div_store(double*, double*, unsigned char):
        kmovw   k1, edx
        vmovupd zmm0 {k1} {z}, zmmword ptr [rdi]
        vmovupd zmm1 {k1} {z}, zmmword ptr [rsi]
        vdivpd  zmm0, zmm0, zmm1
        vmovapd zmm0 {k1} {z}, zmm0
        vmovupd zmmword ptr [rdi] {k1}, zmm0
        vzeroupper
        ret
```

The problem here is that we do a full-width division instead of a masked one causing the exception.
When the flag is specified the division instruction in LLVM IR is 
```llvm
%div.i = tail call noundef <8 x double> @llvm.experimental.constrained.fdiv.v8f64(<8 x double> %1, <8 x double> %2, metadata !"round.tonearest", metadata !"fpexcept.strict") #9
``` 

which is represented as a `strict_fdiv` in DAG and there is no pattern to select a masked variant with strict_fp opcodes (I did find this commit https://github.com/llvm/llvm-project/commit/dbcc1392b3807d7ddcb000741d2ffb276d90d36b that removed them)

Godbolt: https://godbolt.org/z/dMr83s35E
_______________________________________________
llvm-bugs mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

Reply via email to