[Bug target/124913] [LoopVectorize][AArch64] GCC fails to vectorize reverse-loop conditional operations with forward offset that Clang vectorizes with SVE

bug_hunters at yeah dot net via Gcc-bugs Sun, 07 Jun 2026 05:28:56 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=124913


--- Comment #1 from Hunter X <bug_hunters at yeah dot net> ---
I simplified the testcase to the following:

```c
int foo_store(
    int * __restrict__ a,
    int * __restrict__ out,
    int n
) {
    for (int i = n - 1; i >= 0; i--) {
        if (a[i] != 1) {
            out[i] = a[i];
        }
    }
    return 0;
}
```

With:

```
-march=armv9-a+sve -O3 -ftree-vectorize \
-fopt-info-vec-all -fno-trapping-math \
-fvect-cost-model=unlimited
```

GCC 16.1.0 still fails to vectorize this loop.

However, if I change the loop to a forward iteration:

```c
for (int i = 0; i < n; i++)
```

the loop is successfully vectorized by GCC.
```
foo_store:
        cmp     w2, 0
        ble     .L2
        mov     x3, 0
        whilelo p7.s, wzr, w2
.L3:
        ld1w    z31.s, p7/z, [x0, x3, lsl 2]
        cmpne   p7.s, p7/z, z31.s, #1
        st1w    z31.s, p7, [x1, x3, lsl 2]
        incw    x3
        whilelo p7.s, w3, w2
        b.any   .L3
.L2:
        mov     w0, 0
        ret
```
Does this indicate a limitation in the current AArch64 SVE vectorizer when
handling reverse loops that involve predicate-based storage operations? In
particular, is the failure related to the generation of predicates or mask
constructions for reverse iterations, or is there some other legality issue at
play?
Could you clarify whether this is an intentional limitation/design choice of
the current AArch64 SVE vectorizer, or whether this should be treated as a
missed optimization that could be improved in the future?

[Bug target/124913] [LoopVectorize][AArch64] GCC fails to vectorize reverse-loop conditional operations with forward offset that Clang vectorizes with SVE

Reply via email to