Hello, On Wed, 18 Mar 2026, Richard Biener wrote:
> > > That there can be (well-defined) conflicts within a scatter (WAW > > > conflicts) does > > > not help either. Either a RTL representation would disallow that, but > > > then > > > intrinsics cannot map to this scheme, or we somehow have to deal with it. > > > I guess it should be a black box, meaning you cannot combine or split > > > a scatter into/from multiple scatters. > > > > I don't think I understand this point. When can scatters get combined? > > Suppose there's a (vec_concat:V4DI (reg:V2DI) (reg:V2DI)) and > both V2DI are from gathers. So, that's a use of the gather results, i.e. use of loads. > For combining scatters there might be a V4DI scatter pattern, so > presumably two back-to-back scatters could be combined by (vec_concat > ..) on the address vector of the MEM? Now, how come scatters, i.e. writes, into play? Are you worried about combining two gather-loads plus merge plus scatter-store into a single gather-scatter instruction? Well, if the backend/architecture does define a mem-mem (with scatter/gather, no less!) insn, then sure, a combiner could be tempted to try that. I say: good! If the the target does have such an insns, more power to them. Of course the usual RTL semantics must match: all uses (here: the gather loads) come before _all_ writes (the scatter stores). If that's not the case for the target insns, then early-clobbers must be used on the respective operands. > I'm saying we would need to disallow this. I don't see that (if my interpretation of your worry is correct). > I think we need to document exactly what a MEM of a vector address is > in terms of a RTL abstract machine, otherwise we cannot work on it > with generic code. Yes, but I don't see the hardship in doing that. Most of it is obvious: (MEM:VxMODE (rtl:VyPTR)) (x and y, i.e. number of lanes must match!) represents the obvious blobs in memory. If there are overlaps in the blobs: choices: a) target defined b) disallowed aka undefined c) implementation defined (bad choice) always with possibly a flag on the MEM saying "nope, I guarantee no overlap". in a way it's similar to an atomic access straddling a cache line, in respect to atomicity guarantees: ultimately its target dependend, and the compiler cannot nilly-willy invent MEMs with a different structure in such cases. > > [... masking ...] > > Yes, it's a representational issue (for that RTL abstract machine). But not a new one. It would be nice to solve it, sure, but is orthogonal to MEMs of a vector address. Ciao, Michael.
