| Issue |
61579
|
| Summary |
[optimization] gcc generate better code than clang base on predicate SVE
|
| Labels |
new issue
|
| Assignees |
|
| Reporter |
vfdff
|
* test case: https://gcc.godbolt.org/z/d3n1aPanv
```
int check(char* mask, float *result, int n) {
int count = 0;
for(int j = 0; j < n; j++){
if((mask[j] == 0) && (result[j] != 2.0)){
count ++;
}
}
return count;
}
```
* gcc's kernel loop
```
.L3:
ld1b z1.s, p1/z, [x0, x3]
cmpeq p0.b, p1/z, z1.b, #0
ld1w z1.s, p0/z, [x1, x3, lsl 2]
add x3, x3, x4
fcmne p0.s, p0/z, z1.s, z3.s
and p0.b, p0/z, p1.b, p1.b
whilelo p1.s, w3, w2
add z0.s, p0/m, z0.s, z2.s
b.any .L3
```
* llvm's kernel loop, llvm's version is more complex on the updating of the Predicate registers
```
.LBB0_2: // =>This Inner Loop Header: Depth=1
ld1b { z3.s }, p1/z, [x0, x8]
cmpeq p1.s, p1/z, z3.s, #0
ld1w { z3.s }, p1/z, [x1, x8, lsl #2]
add x8, x8, x10
not p3.b, p0/z, p1.b ---
fcmne p2.s, p0/z, z3.s, z1.s
mov p2.b, p3/m, p3.b ---
and p2.b, p1/z, p1.b, p2.b
whilelo p1.s, x8, x9
add z0.s, p2/m, z0.s, z2.s
b.mi .LBB0_2
```
_______________________________________________
llvm-bugs mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs