https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117562
--- Comment #9 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Hongtao Liu from comment #8)
> > vec_unpacks_hi_v4sf create an unintialized (reg:V4SF 853), I guess it may
> > confuse LRA to allocate a mem for it.
>
> For simple case
> void
> foo (double* a, float* b, int n)
> {
> for (int i = 0; i != n; i++)
> a[i] = b[i];
> }
>
> RA works ok, there's no extra spill there.
Yeah, it needs enough register pressure to not have the extra reg here.
I think the proposed patch in comment#7 might be good on its own as it
avoids a false dependence on prior register contents (if not optimizing
for size).
It does fix the benchmark regression as well.
I do wonder about the usefulness of the memory alternative on the
sse_movhlps pattern though, there's the sse_storehps pattern which
also models the store part more precisely as V2SFmode. Is
sse_movhlps_exp ever invoked with a memory destination?
That said, if the memory alternative stays we might want to mark it with '$'
so it's never chosen when the then memory operand needs a reload?