https://gcc.gnu.org/bugzilla/show_bug.cgi?id=125880

--- Comment #9 from Hongtao Liu <liuhongt at gcc dot gnu.org> ---
(In reply to [email protected] from comment #8)
> On Mon, 22 Jun 2026, liuhongt at gcc dot gnu.org wrote:
> 
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=125880
> > 
> > --- Comment #7 from Hongtao Liu <liuhongt at gcc dot gnu.org> ---
> > (In reply to Hongtao Liu from comment #6)
> > > > For the cases above the code comes from the vec_init expander but I can
> > > > imagine this might be too early for a perfect decision.
> > > 
> > > it comes from ix86_expand_vector_init_interleave which use SImode for
> > > V*HI/V*QImode for vec_init_0.
> > >
> > 
> > By the time in ix86_exand_vector_init, we don't know if the source is from
> > memory or gpr.
> > - for memory, pinsrw/pinsrb probably is a win
> > - For register, pinsrw/pinsrb from r32 should be worse than vmovd for port
> > pressure on Intel-P core, but ok for E-core. For Zen: pinsr* is 2u vs 1u
> > (latency-equal-ish); Zen5 gives pinsr great TP (0.25) but vmovd is still 
> > fewer
> > uops.
> 
> Yes, as said RTL expansion is likely to early.  We'd want some kind of
> peephole/splitter or an extension to STV?  Ideally saving the GPR
> use before RA.

Maybe add a define_split for the specific patterns generated by vec_init

1295Trying 57, 59 -> 62:
1296   57: r204:HI=[r98:DI]
1297   59: r205:V4SI=vec_merge(vec_duplicate(r204:HI#0),const_vector,0x1)
1298      REG_DEAD r204:HI
1299   62:
r206:V8HI=vec_merge(vec_duplicate([r300:DI*0x2+r98:DI]),r205:V4SI#0,0x2)
1300      REG_DEAD r205:V4SI
1301Failed to match this instruction:
1302(set (reg:V8HI 206)
1303    (vec_merge:V8HI (subreg:V8HI (vec_merge:V4SI (vec_duplicate:V4SI
(subreg:SI (mem:HI (reg:DI 98 [ ivtmp.30 ]) [1 MEM[(short int *)_28]+0 S2 A16])
0))
1304                (const_vector:V4SI [
1305                        (const_int 0 [0]) repeated x4
1306                    ])
1307                (const_int 1 [0x1])) 0)
1308        (vec_duplicate:V8HI (mem:HI (plus:DI (mult:DI (reg:DI 300 [ _109 ])
1309                        (const_int 2 [0x2]))
1310                    (reg:DI 98 [ ivtmp.30 ])) [1 MEM[(short int *)_28 + _48
* 2]+0 S2 A16]))
1311        (const_int 253 [0xfd])))

Reply via email to