[llvm-bugs] [Bug 141931] AMDGPU should not scalarize v2f16 / v2bf16 copysign

LLVM Bugs via llvm-bugs Thu, 29 May 2025 04:36:21 -0700

Issue	141931
Summary	AMDGPU should not scalarize v2f16 / v2bf16 copysign
Labels	backend:AMDGPU, missed-optimization
Assignees
Reporter	arsenm

    Currently half element copysign is scalarized and produces this ugly expansion:


```
; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 < %s

; s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; s_movk_i32 s4, 0x7fff
; v_bfi_b32 v2, s4, v0, v1
; v_lshrrev_b32_e32 v1, 16, v1
; v_lshrrev_b32_e32 v0, 16, v0
; v_bfi_b32 v0, s4, v0, v1
; s_mov_b32 s4, 0x5040100
; v_perm_b32 v0, v0, v2, s4
; s_setpc_b64 s[30:31]
define <2 x half> @copysign_v2f16(<2 x half> %a, <2 x half> %b) {
  %result = call <2 x half> @llvm.copysign.v2f16(<2 x half> %a, <2 x half> %b)
  ret <2 x half> %result
}

```

If I hack up the vector legalizer's logic, the default expansion finds a vector BFI:

WIth gx803:

```
	s_mov_b32 s4, 0x7fff7fff
	v_bfi_b32 v0, s4, v0, v1
```

With gfx9+, it does worse:
```
	v_and_b32_e32 v1, 0x80008000, v1
	s_mov_b32 s4, 0x7fff7fff
	v_and_or_b32 v0, v0, s4, v1
```


We can trivially extend the existing legal f16 copysign pattern to handle the 2 element case like in the gfx8 output. It's a little more work than that to support the cases where the sign source is a different FP type

_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

[llvm-bugs] [Bug 141931] AMDGPU should not scalarize v2f16 / v2bf16 copysign

Reply via email to