https://gcc.gnu.org/bugzilla/show_bug.cgi?id=125880
--- Comment #8 from rguenther at suse dot de <rguenther at suse dot de> --- On Mon, 22 Jun 2026, liuhongt at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=125880 > > --- Comment #7 from Hongtao Liu <liuhongt at gcc dot gnu.org> --- > (In reply to Hongtao Liu from comment #6) > > > For the cases above the code comes from the vec_init expander but I can > > > imagine this might be too early for a perfect decision. > > > > it comes from ix86_expand_vector_init_interleave which use SImode for > > V*HI/V*QImode for vec_init_0. > > > > By the time in ix86_exand_vector_init, we don't know if the source is from > memory or gpr. > - for memory, pinsrw/pinsrb probably is a win > - For register, pinsrw/pinsrb from r32 should be worse than vmovd for port > pressure on Intel-P core, but ok for E-core. For Zen: pinsr* is 2u vs 1u > (latency-equal-ish); Zen5 gives pinsr great TP (0.25) but vmovd is still fewer > uops. Yes, as said RTL expansion is likely to early. We'd want some kind of peephole/splitter or an extension to STV? Ideally saving the GPR use before RA.
