Issue 61579
Summary [optimization] gcc generate better code than clang base on predicate SVE
Labels new issue
Assignees
Reporter vfdff
    * test case: https://gcc.godbolt.org/z/d3n1aPanv
```
int check(char* mask, float *result, int n) {
    int count = 0;
    for(int j = 0; j < n; j++){
        if((mask[j] == 0) && (result[j] != 2.0)){
 count ++;
        }
    }
    return count;
}
```
*  gcc's kernel loop
```
.L3:
        ld1b    z1.s, p1/z, [x0, x3]
 cmpeq   p0.b, p1/z, z1.b, #0
        ld1w    z1.s, p0/z, [x1, x3, lsl 2]
        add     x3, x3, x4
        fcmne   p0.s, p0/z, z1.s, z3.s
        and     p0.b, p0/z, p1.b, p1.b
        whilelo p1.s, w3, w2
        add     z0.s, p0/m, z0.s, z2.s
        b.any .L3
```
* llvm's kernel loop, llvm's version is more complex on the updating of the Predicate registers
```
.LBB0_2: // =>This Inner Loop Header: Depth=1
        ld1b    { z3.s }, p1/z, [x0, x8]
        cmpeq   p1.s, p1/z, z3.s, #0
        ld1w    { z3.s }, p1/z, [x1, x8, lsl #2]
        add     x8, x8, x10
        not p3.b, p0/z, p1.b        --- 
        fcmne   p2.s, p0/z, z3.s, z1.s
 mov     p2.b, p3/m, p3.b     ---
        and     p2.b, p1/z, p1.b, p2.b
        whilelo p1.s, x8, x9
        add     z0.s, p2/m, z0.s, z2.s
        b.mi    .LBB0_2
```
_______________________________________________
llvm-bugs mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

Reply via email to