https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116338
T G <tanmaygulhane12 at gmail dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |tanmaygulhane12 at gmail dot
com
--- Comment #6 from T G <tanmaygulhane12 at gmail dot com> ---
Still reproduces on riscv64.
GCC 15.2.0 (Ubuntu cross, riscv64-linux-gnu), -O3 -march=rv64gcv -mabi=lp64d:
not vectorized, "unsupported use in stmt" (same analysis failure as comment 5).
GCC trunk 17.0.0 20260610: [FILL IN after step 4: same result / changed]
clang 22.1.7, --target=riscv64-linux-gnu -march=rv64gcv -O3: vectorizes,
vectorization width vscale x 4, using vslideup for the recurrence splice.
I also verified correctness of clang's vectorized RVV output under
qemu-riscv64: results are bit-identical to GCC -O0 over 32000 elements
(FNV-1a over the raw float bits). So the transformation is legal and
profitable hardware support (vslideup) exists.
Reduced reproducer, pointer form, fails identically to the global form:
void s255(float *restrict a, const float *restrict b, int n){
float x = b[n-1], y = b[n-2];
for (int i = 0; i < n; i++){ a[i] = (b[i] + x + y) * 0.333f; y = x; x =
b[i]; }
}