https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80833
--- Comment #11 from Uroš Bizjak <ubizjak at gmail dot com> --- (In reply to Peter Cordes from comment #0) > A lower-latency xmm->int strategy would be: > > movd %xmm0, %eax > pextrd $1, %xmm0, %edx Proposed patch implements the above for generic moves. > Or without SSE4 -mtune=sandybridge (anything that excluded Nehalem and other > CPUs where an FP shuffle has bypass delay between integer ops) > > movd %xmm0, %eax > movshdup %xmm0, %xmm0 # saves 1B of code-size vs. psrldq, I think. > movd %xmm0, %edx > > Or without SSE3, > > movd %xmm0, %eax > psrldq $4, %xmm0 # 1 m-op cheaper than pshufd on K8 > movd %xmm0, %edx The above two proposals are not suitable for generic moves. We should not clobber input value, and we are not allowed to use temporary.