Issue 55447
Summary [RISCV] Poor vector codegen for integer max reduction with Zvl512b
Labels new issue
Assignees
Reporter preames
    https://godbolt.org/z/8vj83rj14

Here's a slightly simplified reproducer:

```
$ cat vector_max_reduce.c 
int vector_max_reduce_i32(int* a, unsigned a_len) {
  int max = -987654321;
  for (unsigned i = 0; i < a_len; i++)
    max = (a[i] > max) ? a[i] : max;
  return max;
}
```

$ clang -S vector_max_reduce.c --target=riscv64 -mllvm -riscv-v-vector-bits-min=512 -Xclang -target-feature -Xclang +v,+f,+m,+d,+zba -O2 -emit-llvm

Key observations here:

1. We only get the double vector loop structure with Zvl512 and above.  Below that, we generate a much more reasonable single vector loop and scalar epilogue.  The opcode (max vs add) does not seem to influence this choice.
2. Even if we generate the vector epilogue, we don't need to do a vector to scale reduction in between.  We do need to do a partial reduction (in this case from 16 lanes to 8 lanes), but we'd be better off deferring the final scalar reduction until the end.  
3. In the final assembly (key block copied below), we seem to have folded the scalar reduction step back into the main vector body.  Note that this is not the IR form!

```
	addi	a4, a5, 64
	vle32.v	v10, (a5)
	vle32.v	v11, (a4)
	vmax.vv	v8, v10, v8
	vmax.vv	v9, v11, v9
	addi	a1, a1, -32
	addi	a5, a5, 128
	bnez	a1, .LBB0_7
	vmax.vv	v8, v8, v9
	lui	a1, 524288
	vmv.s.x	v9, a1
	vredmax.vs	v8, v8, v9
	vmv.x.s	a1, v8
	beq	a3, a2, .LBB0_13
	andi	a4, a2, 24
	beqz	a4, .LBB0_14

```

_______________________________________________
llvm-bugs mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

Reply via email to