http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55829
--- Comment #10 from Vladimir Makarov <vmakarov at gcc dot gnu.org> 2013-01-09 18:15:52 UTC --- (In reply to comment #9) > gcc now generates: > > movq p1(%rip), %r12 # 56 *movdi_internal_rex64/2 [length = 7] > movq %r12, (%rsp) # 57 *movdi_internal_rex64/4 [length = 4] > movddup (%rsp), %xmm1 # 23 *vec_concatv2df/3 [length = 5] > > is there a reason not to load directly from p1, to avoid extra moves: > > movddup p1(%rip), %xmm1 I checked reload pass, it has the same problem (and generates even worse code: +1 insn and using nonzero displacement). It is possible to fix it, but it will be not easy. In any case, I don't think it will fixed soon as I have more important LRA PRs. I'll put it on my TODO list.