Hello,

On Wed, 18 Mar 2026, Richard Biener wrote:

> > > That there can be (well-defined) conflicts within a scatter (WAW 
> > > conflicts) does
> > > not help either.  Either a RTL representation would disallow that, but 
> > > then
> > > intrinsics cannot map to this scheme, or we somehow have to deal with it.
> > > I guess it should be a black box, meaning you cannot combine or split
> > > a scatter into/from multiple scatters.
> >
> > I don't think I understand this point. When can scatters get combined?
> 
> Suppose there's a (vec_concat:V4DI (reg:V2DI) (reg:V2DI)) and
> both V2DI are from gathers.

So, that's a use of the gather results, i.e. use of loads.

> For combining scatters there might be a V4DI scatter pattern, so 
> presumably two back-to-back scatters could be combined by (vec_concat 
> ..) on the address vector of the MEM?

Now, how come scatters, i.e. writes, into play?  Are you worried about 
combining two gather-loads plus merge plus scatter-store into a single 
gather-scatter instruction?  Well, if the backend/architecture does define 
a mem-mem (with scatter/gather, no less!) insn, then sure, a combiner 
could be tempted to try that.  I say: good!  If the the target does have 
such an insns, more power to them.  Of course the usual RTL semantics must 
match: all uses (here: the gather loads) come before _all_ writes (the 
scatter stores).  If that's not the case for the target insns, then 
early-clobbers must be used on the respective operands.

> I'm saying we would need to disallow this.

I don't see that (if my interpretation of your worry is correct).

> I think we need to document exactly what a MEM of a vector address is
> in terms of a RTL abstract machine, otherwise we cannot work on it
> with generic code.

Yes, but I don't see the hardship in doing that.  Most of it is 
obvious: (MEM:VxMODE (rtl:VyPTR)) (x and y, i.e. number of lanes must 
match!) represents the obvious blobs in memory.  If there are overlaps in 
the blobs: choices:

a) target defined
b) disallowed aka undefined
c) implementation defined (bad choice)

always with possibly a flag on the MEM saying "nope, I guarantee no 
overlap".

in a way it's similar to an atomic access straddling a cache line, in 
respect to atomicity guarantees: ultimately its target dependend, and the 
compiler cannot nilly-willy invent MEMs with a different structure in such 
cases.

> > [... masking ...]
> 
> Yes, it's a representational issue (for that RTL abstract machine).

But not a new one.  It would be nice to solve it, sure, but is orthogonal 
to MEMs of a vector address.


Ciao,
Michael.

Reply via email to