On 3/20/26 20:22, Richard Biener via Gcc wrote:
On Thu, Mar 19, 2026 at 10:36 PM Michael Clark wrote:
at least that is how I would intend to use scatter store and gather
load. we are not running these memory operations LOCK synchronized. the
guarantee here is that vector operations run in program order and each
of the memory addresses have independence. that is what I want. I am
doing a parallel scatter/gather to/from an array of structures and its
just convenient to parallel sum the stride into a vector of addresses.
Yes, that's true when you are writing parallel code. For vectorization we
have to preserve the scalar program order behavior though, which is where
those "vector ISA" (as opposed to "thread ISA") scatter instruction
guarantees are very useful.
or "thread-vector" ISA.
just wanted to add that "zip" of two or four fields would be a better
example. field from array of structures is just indexed-load-store. the
cases where addresses are not synthesized by some compiler primitive,
rather by user code, in some "threaded-for", are much more complicated.
I'd expect unpredictable behavior if there were overlapping addresses.
Michael.