on 2021/8/24 下午9:02, Segher Boessenkool wrote: > Hi Ke Wen, > > On Mon, Aug 09, 2021 at 10:53:00AM +0800, Kewen.Lin wrote: >> on 2021/8/6 下午9:10, Bill Schmidt wrote: >>> On 8/4/21 9:06 PM, Kewen.Lin wrote: >>>> The existing vec_unpacku_{hi,lo} supports emulated unsigned >>>> unpacking for short and char but misses the support for int. >>>> This patch adds the support for vec_unpacku_{hi,lo}_v4si. > >> * config/rs6000/altivec.md (vec_unpacku_hi_v16qi): Remove. >> (vec_unpacku_hi_v8hi): Likewise. >> (vec_unpacku_lo_v16qi): Likewise. >> (vec_unpacku_lo_v8hi): Likewise. >> (vec_unpacku_hi_<VP_small_lc>): New define_expand. >> (vec_unpacku_lo_<VP_small_lc>): Likewise. > >> -(define_expand "vec_unpacku_hi_v16qi" >> - [(set (match_operand:V8HI 0 "register_operand" "=v") >> - (unspec:V8HI [(match_operand:V16QI 1 "register_operand" "v")] >> - UNSPEC_VUPKHUB))] >> - "TARGET_ALTIVEC" >> -{ >> - rtx vzero = gen_reg_rtx (V8HImode); >> - rtx mask = gen_reg_rtx (V16QImode); >> - rtvec v = rtvec_alloc (16); >> - bool be = BYTES_BIG_ENDIAN; >> - >> - emit_insn (gen_altivec_vspltish (vzero, const0_rtx)); >> - >> - RTVEC_ELT (v, 0) = gen_rtx_CONST_INT (QImode, be ? 16 : 7); >> - RTVEC_ELT (v, 1) = gen_rtx_CONST_INT (QImode, be ? 0 : 16); >> - RTVEC_ELT (v, 2) = gen_rtx_CONST_INT (QImode, be ? 16 : 6); >> - RTVEC_ELT (v, 3) = gen_rtx_CONST_INT (QImode, be ? 1 : 16); >> - RTVEC_ELT (v, 4) = gen_rtx_CONST_INT (QImode, be ? 16 : 5); >> - RTVEC_ELT (v, 5) = gen_rtx_CONST_INT (QImode, be ? 2 : 16); >> - RTVEC_ELT (v, 6) = gen_rtx_CONST_INT (QImode, be ? 16 : 4); >> - RTVEC_ELT (v, 7) = gen_rtx_CONST_INT (QImode, be ? 3 : 16); >> - RTVEC_ELT (v, 8) = gen_rtx_CONST_INT (QImode, be ? 16 : 3); >> - RTVEC_ELT (v, 9) = gen_rtx_CONST_INT (QImode, be ? 4 : 16); >> - RTVEC_ELT (v, 10) = gen_rtx_CONST_INT (QImode, be ? 16 : 2); >> - RTVEC_ELT (v, 11) = gen_rtx_CONST_INT (QImode, be ? 5 : 16); >> - RTVEC_ELT (v, 12) = gen_rtx_CONST_INT (QImode, be ? 16 : 1); >> - RTVEC_ELT (v, 13) = gen_rtx_CONST_INT (QImode, be ? 6 : 16); >> - RTVEC_ELT (v, 14) = gen_rtx_CONST_INT (QImode, be ? 16 : 0); >> - RTVEC_ELT (v, 15) = gen_rtx_CONST_INT (QImode, be ? 7 : 16); >> - >> - emit_insn (gen_vec_initv16qiqi (mask, gen_rtx_PARALLEL (V16QImode, v))); >> - emit_insn (gen_vperm_v16qiv8hi (operands[0], operands[1], vzero, mask)); >> - DONE; >> -}) > > So I wonder if all this still generates good code. The unspecs cannot > be optimised properly, the RTL can (in principle, anyway: it is possible > it makes more opportunities to use unpack etc. insns invisible than that > it helps over unspec. This needs to be tested, and the usual idioms > need testcases, is that what you add here? (/me reads on...) >
Yeah, for existing char/short, it generates better codes with vector merging high/low instead of permutation, by saving the cost for the permutation control vector (space in constant area as well as the cost to initialize it in prologue). The iterator writing makes it concise and also add the missing "int" support. The associated test cases are to verify new generated assembly and runtime result. >> + if (BYTES_BIG_ENDIAN) >> + emit_insn (gen_altivec_vmrgh<VU_char> (res, vzero, op1)); >> + else >> + emit_insn (gen_altivec_vmrgl<VU_char> (res, op1, vzero)); > > Ah, so it is *not* using unspecs? Excellent. > > Okay for trunk. Thank you! > Thanks for the review! Committed in r12-3134. BR, Kewen