Issue 164399
Summary [X86] Poor AVX512 codegen with constant predicate
Labels backend:X86, missed-optimization
Assignees
Reporter RKSimon
    Noticed while reviewing constexpr handling of the predicated arithmetic:
```ll
define <16 x i32> @add(<16 x i32> %x, <16 x i32> %y) {
 %add = add <16 x i32> %y, %x
  %res = shufflevector <16 x i32> %add, <16 x i32> zeroinitializer, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31>
  ret <16 x i32> %res
}
```
```asm
add: # @add
  vpaddd %zmm0, %zmm1, %zmm0
  movw $255, %ax
  kmovd %eax, %k1
  vpexpandd %zmm0, %zmm0 {%k1} {z}
  retq
```
Lots of things going wrong here:
1. Lowering the shuffle as an expansion instead of a select (which would fold into a predicated instruction)
2. Use of movw/kmovd instead of kxnorb to rematerialize the 0xFF predicate mask directly
3. Zeroing upper 256-bits of the vector - so this could have just been done as `vpaddd %ymm0, %ymm1, %ymm0` for implicit zeroing
_______________________________________________
llvm-bugs mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

Reply via email to