On 19/03/2026 08:25, Richard Biener wrote:
That said - the *scatter* optabs assume naive vectorization of the
first loop works, even when b[] = { 0, 1, 2, 0 }, so if GCN is not able
to guarantee this their "vector address store" are not scatters in
terms of what GCC assumes. The documentation for the optabs
does not mention this constraint.
The primary use-case for the GCN port is OpenMP/OpenACC in which loop
iterations are considered to be "independent" and therefore all such
considerations can be ignored. Not only is vectorization in play, but
also two levels of threading, so there is absolutely no guarantee what
order operations happen. If the user writes code that is not, in fact,
"independent" then that's on them.
There have indeed been a few occasions where GCC has refused to optimize
because it would not preserve "correctness" even though all hope of that
correctness have already gone.
We "fixed" the floating-point reduction case by implementing "fold_left"
optabs that actually do not strictly fold left, albeit only when
-fopenmp is active. Consequently, the result of floating-point vector
reductions is stable, but it's not the same stable you'd get from the
unvectorized loop. (The result of the outer OpenMP reduction loop, as a
whole, is unstable, because the threads complete out of order.)
Basically -fopenmp implies -fassociative-math, in this case.
If necessary, we'd do the same thing for scatter_store.
Andrew