Issue |
151692
|
Summary |
LLVM incorrectly calls __truncsfbf2 after a bfloat function call in unoptimized code on x86
|
Labels |
new issue
|
Assignees |
|
Reporter |
johnplatts
|
Here is a LLVM IR snippet that has incorrect codegen on x86: https://godbolt.org/z/4Ezb51We8
Here is the expected code that should have been generated for the above snippet on x86_64 (without the incorrect __truncsfbf2 call):
```
BitCastI16ToBF16Wrapper: # @BitCastI16ToBF16Wrapper
push rax
call BitCastI16ToBF16
lea rdi, [rsp + 6]
call CreateBF16WrapperFromBF16
mov ax, word ptr [rsp + 6]
pop rcx
ret
BitCastI16ToBF16: # @BitCastI16ToBF16
pinsrw xmm0, word ptr [rdi], 0
ret
CreateBF16WrapperFromBF16: # @CreateBF16WrapperFromBF16
pextrw eax, xmm0, 0
mov word ptr [rdi], ax
ret
```
In addition, there is an additional bug on x86_32 that assumes that the result of the BitCastI16ToBF16 is returned as a 32-bit floating-point value in `st(0)` instead of as an BF16 value in the `xmm0` register.
_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs