Issue 173030
Summary [Clang 21] Potential bug in lowering of shufflevector
Labels clang
Assignees
Reporter alepping
    Reproducer:
https://godbolt.org/z/YEvhYdPo7

Background:
We use the MLIR IR builder to generate MLIR code from C++ code.
We lower the MLIR code to LLVM IR, JIT compile and execute the resulting code.
The problem does not occur with representative C code, because C adds a sign extension instruction for the 8-bit signed integer, which prevents the optimization that causes the problem.

Test case:
C code representing the test case (does not cause the error when compiled with clang (see godbolt), but helps clarifying the setup):
```C
void float_division(int8_t i8, int16_t i16, int32_t i32, int64_t i64, float f32, float* results) {
 results[0] = f32 / i8;
    results[1] = f32 / (i8 + 1);
    results[2] = f32 / i16;
    results[3] = f32 / (i16 + 1);
    results[4] = f32 / i32;
 results[5] = f32 / (i32 + 1);
    results[6] = f32 / i64;
    results[7] = f32 / (i64 + 1);
}
```
Input:
`float_division(1,1,1,1,2.0,results)`
Expected result:
{2,1,2,1,2,1,2,1}
Actual result (using the MLIR builder):
{2,12,inf,2,1,2,1}


Architecture:
see [godbold](https://godbolt.org/z/YEvhYdPo7)


Problem:
When switching from LLVM 20 to LLVM 21 one of our tests started to fail, returning incorrect results.
The godbolt link shows a boiled-down version.
In essence, the generated LLVM IR looks fine, but the generated assembly seems to work with and return poison values.

Details:
We have four signed integers, one 8-bit, one 16-bit, one 32-bit and one 64-bit and a 32-bit floating point value.
We divide the floating point value by each of the four integers and by each of the four integers + 1 (8 divisions overall).
The generated LLVM IR (not using Clang, but using the MLIR builder) essentially looks like this:
```llvm
define void @faulty(i8 %0, i16 %1, i32 %2, i64 %3, float %4, ptr writeonly captures(none) initializes((0, 32)) %5) local_unnamed_addr #0 {
  %7 = add i8 %0, 1
  %8 = insertelement <2 x i8> poison, i8 %0, i64 0
 %9 = insertelement <2 x i8> %8, i8 %7, i64 1
  %10 = sitofp <2 x i8> %9 to <2 x float>
  %11 = add i16 %1, 1
  %12 = insertelement <2 x i16> poison, i16 %1, i64 0
  %13 = insertelement <2 x i16> %12, i16 %11, i64 1
  %14 = sitofp <2 x i16> %13 to <2 x float>
  %15 = add i32 %2, 1
  %16 = insertelement <2 x i32> poison, i32 %2, i64 0
  %17 = insertelement <2 x i32> %16, i32 %15, i64 1
  %18 = sitofp <2 x i32> %17 to <2 x float>
  %19 = sitofp i64 %3 to float
  %20 = add i64 %3, 1
  %21 = sitofp i64 %20 to float
  %22 = insertelement <8 x float> poison, float %4, i64 0
  %23 = shufflevector <8 x float> %22, <8 x float> poison, <8 x i32> zeroinitializer
  %24 = shufflevector <2 x float> %10, <2 x float> %14, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 poison, i32 poison, i32 poison, i32 poison>
  %25 = shufflevector <2 x float> %18, <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
  %26 = shufflevector <8 x float> %24, <8 x float> %25, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 poison, i32 poison>
 %27 = insertelement <8 x float> %26, float %19, i64 6
  %28 = insertelement <8 x float> %27, float %21, i64 7
  %29 = fdiv <8 x float> %23, %28
  store <8 x float> %29, ptr %5, align 4
  ret void
}
```
As far as we can see, the LLVM IR looks fine.
However, the generated assembly code:
```asm
faulty:
        lea     eax, [rdi + 1]
        vmovd   xmm1, edi
        vpinsrb xmm1, xmm1, eax, 1
        vmovd   xmm2, esi
 inc     esi
        vpinsrw xmm2, xmm2, esi, 1
        vmovd   xmm3, edx
 inc     edx
        vpinsrd xmm3, xmm3, edx, 1
        vcvtsi2ss xmm4, xmm15, rcx
        vpunpckldq      xmm1, xmm1, xmm2
 vcvtdq2ps       xmm2, xmm3
        inc     rcx
        vcvtsi2ss xmm3, xmm15, rcx
        vbroadcastss    ymm0, xmm0
        vpmovsxbd ymm1, xmm1
        vcvtdq2ps       ymm1, ymm1
        vinsertf128     ymm2, ymm1, xmm2, 1
        vextractf128    xmm1, ymm1, 1
        vpunpcklqdq ymm1, ymm2, ymm1
        vbroadcastss    ymm2, xmm4
        vblendps ymm1, ymm1, ymm2, 64
        vbroadcastss    ymm2, xmm3
        vblendps ymm1, ymm1, ymm2, 128
        vdivps  ymm0, ymm0, ymm1
        vmovups ymmword ptr [r8], ymm0
        vzeroupper
        ret
```
seems to be faulty/problematic.
As far as we can see (observed while debugging) these instructions
```asm
vpmovsxbd       ymm1, xmm1
vcvtdq2ps       ymm1, ymm1
```
(can) load poison values (garbage) into the higher vector registers and then converts these poison values to floats.
The next instructions use these poisoned floats making them part of the result.
The [godbolt reproducer](https://godbolt.org/z/YEvhYdPo7) demonstrates this behavior, causing `inf` to become part of the result.
_______________________________________________
llvm-bugs mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

Reply via email to