on 2021/8/24 下午9:02, Segher Boessenkool wrote:
> Hi Ke Wen,
> 
> On Mon, Aug 09, 2021 at 10:53:00AM +0800, Kewen.Lin wrote:
>> on 2021/8/6 下午9:10, Bill Schmidt wrote:
>>> On 8/4/21 9:06 PM, Kewen.Lin wrote:
>>>> The existing vec_unpacku_{hi,lo} supports emulated unsigned
>>>> unpacking for short and char but misses the support for int.
>>>> This patch adds the support for vec_unpacku_{hi,lo}_v4si.
> 
>>      * config/rs6000/altivec.md (vec_unpacku_hi_v16qi): Remove.
>>      (vec_unpacku_hi_v8hi): Likewise.
>>      (vec_unpacku_lo_v16qi): Likewise.
>>      (vec_unpacku_lo_v8hi): Likewise.
>>      (vec_unpacku_hi_<VP_small_lc>): New define_expand.
>>      (vec_unpacku_lo_<VP_small_lc>): Likewise.
> 
>> -(define_expand "vec_unpacku_hi_v16qi"
>> -  [(set (match_operand:V8HI 0 "register_operand" "=v")
>> -        (unspec:V8HI [(match_operand:V16QI 1 "register_operand" "v")]
>> -                     UNSPEC_VUPKHUB))]
>> -  "TARGET_ALTIVEC"      
>> -{  
>> -  rtx vzero = gen_reg_rtx (V8HImode);
>> -  rtx mask = gen_reg_rtx (V16QImode);
>> -  rtvec v = rtvec_alloc (16);
>> -  bool be = BYTES_BIG_ENDIAN;
>> -   
>> -  emit_insn (gen_altivec_vspltish (vzero, const0_rtx));
>> -   
>> -  RTVEC_ELT (v,  0) = gen_rtx_CONST_INT (QImode, be ? 16 :  7);
>> -  RTVEC_ELT (v,  1) = gen_rtx_CONST_INT (QImode, be ?  0 : 16);
>> -  RTVEC_ELT (v,  2) = gen_rtx_CONST_INT (QImode, be ? 16 :  6);
>> -  RTVEC_ELT (v,  3) = gen_rtx_CONST_INT (QImode, be ?  1 : 16);
>> -  RTVEC_ELT (v,  4) = gen_rtx_CONST_INT (QImode, be ? 16 :  5);
>> -  RTVEC_ELT (v,  5) = gen_rtx_CONST_INT (QImode, be ?  2 : 16);
>> -  RTVEC_ELT (v,  6) = gen_rtx_CONST_INT (QImode, be ? 16 :  4);
>> -  RTVEC_ELT (v,  7) = gen_rtx_CONST_INT (QImode, be ?  3 : 16);
>> -  RTVEC_ELT (v,  8) = gen_rtx_CONST_INT (QImode, be ? 16 :  3);
>> -  RTVEC_ELT (v,  9) = gen_rtx_CONST_INT (QImode, be ?  4 : 16);
>> -  RTVEC_ELT (v, 10) = gen_rtx_CONST_INT (QImode, be ? 16 :  2);
>> -  RTVEC_ELT (v, 11) = gen_rtx_CONST_INT (QImode, be ?  5 : 16);
>> -  RTVEC_ELT (v, 12) = gen_rtx_CONST_INT (QImode, be ? 16 :  1);
>> -  RTVEC_ELT (v, 13) = gen_rtx_CONST_INT (QImode, be ?  6 : 16);
>> -  RTVEC_ELT (v, 14) = gen_rtx_CONST_INT (QImode, be ? 16 :  0);
>> -  RTVEC_ELT (v, 15) = gen_rtx_CONST_INT (QImode, be ?  7 : 16);
>> -
>> -  emit_insn (gen_vec_initv16qiqi (mask, gen_rtx_PARALLEL (V16QImode, v)));
>> -  emit_insn (gen_vperm_v16qiv8hi (operands[0], operands[1], vzero, mask));
>> -  DONE;
>> -})
> 
> So I wonder if all this still generates good code.  The unspecs cannot
> be optimised properly, the RTL can (in principle, anyway: it is possible
> it makes more opportunities to use unpack etc. insns invisible than that
> it helps over unspec.  This needs to be tested, and the usual idioms
> need testcases, is that what you add here?  (/me reads on...)
> 

Yeah, for existing char/short, it generates better codes with vector
merging high/low instead of permutation, by saving the cost for the
permutation control vector (space in constant area as well as the cost
to initialize it in prologue).  The iterator writing makes it concise
and also add the missing "int" support.  The associated test cases are
to verify new generated assembly and runtime result.

>> +  if (BYTES_BIG_ENDIAN)
>> +    emit_insn (gen_altivec_vmrgh<VU_char> (res, vzero, op1));
>> +  else
>> +    emit_insn (gen_altivec_vmrgl<VU_char> (res, op1, vzero));
> 
> Ah, so it is *not* using unspecs?  Excellent.
> 
> Okay for trunk.  Thank you!
> 

Thanks for the review!  Committed in r12-3134.


BR,
Kewen

Reply via email to