Issue 151692
Summary LLVM incorrectly calls __truncsfbf2 after a bfloat function call in unoptimized code on x86
Labels new issue
Assignees
Reporter johnplatts
    Here is a LLVM IR snippet that has incorrect codegen on x86: https://godbolt.org/z/4Ezb51We8

Here is the expected code that should have been generated for the above snippet on x86_64 (without the incorrect __truncsfbf2 call):
```
BitCastI16ToBF16Wrapper:                # @BitCastI16ToBF16Wrapper
        push    rax
        call BitCastI16ToBF16
        lea     rdi, [rsp + 6]
        call CreateBF16WrapperFromBF16
        mov     ax, word ptr [rsp + 6]
 pop     rcx
        ret
BitCastI16ToBF16:                       # @BitCastI16ToBF16
        pinsrw  xmm0, word ptr [rdi], 0
 ret
CreateBF16WrapperFromBF16:              # @CreateBF16WrapperFromBF16
 pextrw  eax, xmm0, 0
        mov     word ptr [rdi], ax
 ret
```

In addition, there is an additional bug on x86_32 that assumes that the result of the BitCastI16ToBF16 is returned as a 32-bit floating-point value in `st(0)` instead of as an BF16 value in the `xmm0` register.
_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

Reply via email to