https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96918

--- Comment #10 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Hongtao.liu from comment #9)
> Or the backend add combine helper insn to match
> 
> Failed to match this instruction:
> (set (reg:V8HI 90)
>     (rotate:V8HI (reg:V8HI 91)
>         (const_int 8 [0x8])))

letency of sequence in bswap_epi16 is 3, but 5 for vpshufb w/ memory operand.
it looks to me gcc's version is better.


bswap_epi16(short __vector(8)):
  vpsllw xmm1, xmm0, 8
  vpsrlw xmm0, xmm0, 8
  vpor xmm0, xmm1, xmm0
  ret

foo(char __vector(16)):
  vpshufb xmm0, xmm0, XMMWORD PTR .LC0[rip]
  ret

Reply via email to