SUBREG for vectors question

Andrew Stubbs Fri, 23 Jan 2026 04:58:38 -0800

Hi all,

I've hit up against a RTL generation problem for which I'm struggling tocome up with a good solution.

TL:DR: I need a way to express "the highpart of every lane in a vector",and "(subreg:V64SI (reg:V64DI 123) 4)" is *not* it.

I can insert temporary values and stick them together with unspecs, butthat's ugly and doesn't give optimal register allocation.


Any help or suggestions would be appreciated.

Thanks

Andrew



Long version:

I have a scalar insn that looks like this:

  (set (subreg:SI (reg:DI 123) 4)
       (whatever:SI))

So it's trying to write to the highpart of a DImode register. This is acommon thing because this architecture (amdgcn) forms DImode values frompairs of SImode registers. After reload this will be a simple registerassignment.

And I want to convert it to a vector instruction that does exactly thesame thing, but 64 times in parallel:


  (set (subreg:V64SI (reg:V64DI 123) 4)
       (whatever:V64SI))

On amdgcn V64DImode values are formed from pairs of V64SImode registers,such that the all the low parts are in one register, and all the highparts are in the next register, so again, after reload this will be asimple register assignment.

Except it isn't, because the SUBREG actually means a group ofconsecutive lanes starting at the high part of lane 0, which is not whatI want. The compiler ends up writing the whole vector to the stack, andreloading some of it in an entirely broken bitwise reinterpretation ofthe data.


I have a temporary solution that looks like this:

  (set (reg:V64SI 789)
       (whatever:V64SI))

  (set (reg:V64DI 123)
       (unspec:V64DI
          [(reg:V64DI 123)
           (reg:V64SI 789)
           (1)]
          UNSPEC_VSUBREG))

This is logically correct, but there's really no way to ensure that thetemporary register is in the high part position (and amdgcn does requirethat the two parts be consecutive) so I end up with extra moves, plus ofcourse the inscrutable UNSPEC. (It's easier for the low part, I think,where the hardreg number will match, but I didn't try to optimize thateither.)

Another solution might be to use a PARALLEL, to list the scalaroperation for every lane individually, but that's just getting silly,and I don't think the compiler will actually make any more sense of it.

I think, ideally, there'd be a "VSUBREG" operator that works on vectorlanes identically to how SUBREG does for scalars, or maybe the mode ofthe regular SUBREG could be configurable per architecture, but addingthat will mean touching a lot of things and probably isn't practical.

Is there already a way to represent what I need so that LRA willsubstitute the high part register as I need it?


Thanks

Andrew

SUBREG for vectors question

Reply via email to