https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80833

--- Comment #11 from Uroš Bizjak <ubizjak at gmail dot com> ---
(In reply to Peter Cordes from comment #0)
> A lower-latency xmm->int strategy would be:
> 
>         movd    %xmm0, %eax
>         pextrd  $1, %xmm0, %edx

Proposed patch implements the above for generic moves.

> Or without SSE4 -mtune=sandybridge (anything that excluded Nehalem and other
> CPUs where an FP shuffle has bypass delay between integer ops)
> 
>         movd     %xmm0, %eax
>         movshdup %xmm0, %xmm0  # saves 1B of code-size vs. psrldq, I think.
>         movd     %xmm0, %edx
> 
> Or without SSE3,
> 
>         movd     %xmm0, %eax
>         psrldq   $4,  %xmm0    # 1 m-op cheaper than pshufd on K8
>         movd     %xmm0, %edx

The above two proposals are not suitable for generic moves. We should not
clobber input value, and we are not allowed to use temporary.

Reply via email to