On 17/03/2026 14:44, Richard Biener wrote:
On Tue, Mar 17, 2026 at 3:12 PM Michael Matz <[email protected]> wrote:
Hello,
On Tue, 17 Mar 2026, Richard Biener via Gcc wrote:
The issue is that (mem:<vectype> (reg:<vectype>)) does not play
nicely with the idea that a (mem:...) accesses contiguous memory
That's the big thing indeed. If it were only MEM_ATTRs the solution is
simple: assert that there aren't any on those vMEMs (or as Andrew suggests
later, only a sane subset). It think there are more places that
conceptually assume such a contiguous access, like disambiguation and
similar, without MEM_ATTRs, in the sense that if those places think they
have figured out a lower bound of the base address and an upper bound of
the access size, then they assume that nothing outside that range is
accessed.
That there can be (well-defined) conflicts within a scatter (WAW conflicts) does
not help either. Either a RTL representation would disallow that, but then
intrinsics cannot map to this scheme, or we somehow have to deal with it.
I guess it should be a black box, meaning you cannot combine or split
a scatter into/from multiple scatters.
I don't think I understand this point. When can scatters get combined?
How is this different from the existing scatter_load patterns? Or is it
just that those are unspecs that have never allowed transformations
outside the backend?
Or is it just that scalar MEM can get combined and the optimizer might
try the same thing with vector MEM?
I'm not proposing any changes to the middle-end representation or
features, nor any enforced changes to backend capabilities. The change
in representation will primarily be more convenient, and if it adds more
expressability for the future then good.
I *am* asking what the unintended consequences might be, so if this is
one of those then thank you.
I also think that all those could be fixed as well (e.g. by giving up).
Furthermore I think we somewhen do need a proper representation of the
concept behind scatter/gather MEMs, where "proper" is not "a RTL
vec_concat of MEMs". If we went that vec_concat route when vector modes
were introduced and we had represented vector REGs as a vec_concat as
well, instead of the current top-level RTL REG, we would all be mad by
now.
So, IMHO a top-level construct for "N-lane MEM access with N-lane
addresses" is the right thing to do (and was, for a long time). The only
question is specifics: should it be a MEM, or a new top-level code?
Should the only difference between a MEM as-of-now and the vMEM be the
fact that the address has a vector mode? Or flags on the MEM?
(IMHO: MEM with vMODE addresses is enough, but see below for a case of
new toplevel code).
Which transformations should be allowed to be represented within the
addresses? Should it only be a vMODE REG? Could it be more, like the
scalar offset that's added to all lanes that Andrews architecture would
have, or a scalar scale that's multiplied to each lane? How to represent
that? If the vMEM would be a separate top-level RTL, it could have two
slots, one for the base addresses (vMODE), and one for an arithmetic
scalar transform applied to each lane (word_mode). With a MEM that's more
complicated and would somehow have to be wrapped in the vMODE address.
But the latter might be convenient in other places as well, for instance
when calculating such address vector without actual memory access.
And so on...
But I think when Andrew wants to put in the work to make this ... well,
work, then it would be good for GCC.
I think the recent discussion on how to represent (len-)masking and else
values also comes into play here given at least we have masked variants
of gathers and scatters.
I've not been following this (the "len" stuff is not usually relevant to
GCN), so I'm not completely sure which issue you're referring to.
I agree that masking is an issue here, because ...
(set (mem ....)
(vec_merge
(src)
(mem .... "0")
(mask)))
... is potentially different to ...
(set (reg 123)
(mem ...))
(set (reg 123)
(vec_merge
(src)
(reg 123)
(mask)))
(set (mem ...)
(reg 123))
However, this was already an issue for contiguous vectors, so while it
could be a new problem for GCN (good catch!), surely this is an old
problem on other architectures?
(This is not currently a problem on GCN because maskload gives an
unbreakable UNSPEC.)
Andrew