https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68961
--- Comment #11 from Richard Biener <rguenth at gcc dot gnu.org> ---
Note that the fix depends on "bogus" cost for the vector construction on
x86_64.
Currently it is two stmts (nunits / 2 + 1) but the vector can be constructed
by a single unpcklpd stmt. The correct cost is nunits - 1.
Similar to the PPC case the main issue is that the fact that incoming registers
have an exact overlap with the return value registers is hidden from the GIMPLE
IL:
pack (double a, double aa)
{
struct x D.1756;
<bb 2>:
MEM[(struct x *)&D.1756] = a_2(D);
MEM[(struct x *)&D.1756 + 8B] = aa_3(D);
return D.1756;
}
Detecting the exact overlap is probably too hard but at least detecting that
we don't return in memory and thus the store is not a store and that we return
in two different regs and thus require two vector extractions should be
possible.