================
@@ -22,10 +22,7 @@ define void @add(ptr %pa, ptr %pb, ptr %pc) nounwind {
 ; X86-NEXT:    vaddss %xmm0, %xmm1, %xmm0
 ; X86-NEXT:    vmovss %xmm0, (%esp)
 ; X86-NEXT:    calll __truncsfbf2
-; X86-NEXT:    fstps {{[0-9]+}}(%esp)
-; X86-NEXT:    vmovd {{.*#+}} xmm0 = mem[0],zero,zero,zero
-; X86-NEXT:    vmovd %xmm0, %eax
-; X86-NEXT:    movw %ax, (%esi)
+; X86-NEXT:    vmovsh %xmm0, (%esi)
----------------
phoebewang wrote:

`vmovsh` can store the low 16-bit to memory directly.
The original codes has ABI mistake, which store `bf16` without `f32`. Since 
`f32` uses X87 registers on 32-bit target, it need to store to another memory 
first, reload to store again.
The patch makes the result in XMM0, so one `vmovsh` is enough.

https://github.com/llvm/llvm-project/pull/76901
_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to