[Bug target/94962] Suboptimal AVX2 code for _mm256_zextsi128_si256(_mm_set1_epi8(-1))

crazylht at gmail dot com Mon, 18 May 2020 03:01:25 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94962


--- Comment #4 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Jakub Jelinek from comment #2)
> But such an instruction isn't always redundant, it really depends on what
> the previous setter of the register did, whether the upper 128 bit of the
> 256-bit register are already guaranteed to be zero or not.
----
(define_insn "avx_vec_concat<mode>"
  [(set (match_operand:V_256_512 0 "register_operand" "=x,v,x,Yv")
        (vec_concat:V_256_512
          (match_operand:<ssehalfvecmode> 1 "nonimmediate_operand" "x,v,xm,vm")
          (match_operand:<ssehalfvecmode> 2 "nonimm_or_0_operand"
"xm,vm,C,C")))]

define_insn "*<extract_type>_vinsert<shuffletype><extract_suf>_0"
  [(set (match_operand:AVX512_VEC 0 "register_operand" "=v,x,Yv")
        (vec_merge:AVX512_VEC
          (match_operand:AVX512_VEC 1 "reg_or_0_operand" "v,C,C")
          (vec_duplicate:AVX512_VEC
                (match_operand:<ssequartermode> 2 "nonimmediate_operand"
"vm,xm,vm"))
          (match_operand:SI 3 "const_int_operand" "n,n,n")))]

----
Upper part already zeroed.

> Thus the #c1 patch looks incorrect to me, one would need peephole2s or some
> combine patterns or target specific pass etc. to discover that at least for
> the common cases; and it isn't something we model in the RTL patterns (what
> insns guarantee which upper bits zero and what do not; and for some there
> can be different choices even in the same define_insn, we could implement
> something using widened registers and then there would be no guarantee etc.).

[Bug target/94962] Suboptimal AVX2 code for _mm256_zextsi128_si256(_mm_set1_epi8(-1))

Reply via email to