Issue 180630
Summary Suboptimal lowering of float bitwise ops on targets without hardware support
Labels
Assignees
Reporter tgross35
    Demo: https://llvm.godbolt.org/z/7E6xxn47b. Input:

```llvm
define zeroext i1 @foo(half %x) unnamed_addr {
start:
  %i = bitcast half %x to i16
 %masked = and i16 %i, 32767
  %r = icmp eq i16 %masked, 0
  ret i1 %r
}
```

Instcombine turns this into:

```llvm
define zeroext i1 @foo(half %x) unnamed_addr {
start:
  %r = fcmp oeq half %x, 0xH0000
  ret i1 %r
}
```

Then on x86, the following is generated:

```asm
foo:
 push    rax
        call    __extendhfsf2@PLT
        xorps   xmm1, xmm1
        cmpeqss xmm1, xmm0
        movd    eax, xmm1
        and eax, 1
        pop     rcx
        ret
```

The bitwise ops would be ~4 instructions. The generated code is significantly worse given the cost of calling `__extendhfsf2`.

This shows up for all float types where there isn't hardware support. For example, https://rust.godbolt.org/z/oYWYnja4q has a libcall for all of `half`, `float`, `double`, `fp128` that shouldn't be needed.
_______________________________________________
llvm-bugs mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

Reply via email to