Hi,

On 2020/9/1 21:07, Richard Biener wrote:
> On Tue, Sep 1, 2020 at 10:11 AM luoxhu via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
>>
>> Hi,
>>
>> On 2020/9/1 01:04, Segher Boessenkool wrote:
>>> Hi!
>>>
>>> On Mon, Aug 31, 2020 at 04:06:47AM -0500, Xiong Hu Luo wrote:
>>>> vec_insert accepts 3 arguments, arg0 is input vector, arg1 is the value
>>>> to be insert, arg2 is the place to insert arg1 to arg0.  This patch adds
>>>> __builtin_vec_insert_v4si[v4sf,v2di,v2df,v8hi,v16qi] for vec_insert to
>>>> not expand too early in gimple stage if arg2 is variable, to avoid generate
>>>> store hit load instructions.
>>>>
>>>> For Power9 V4SI:
>>>>       addi 9,1,-16
>>>>       rldic 6,6,2,60
>>>>       stxv 34,-16(1)
>>>>       stwx 5,9,6
>>>>       lxv 34,-16(1)
>>>> =>
>>>>       addis 9,2,.LC0@toc@ha
>>>>       addi 9,9,.LC0@toc@l
>>>>       mtvsrwz 33,5
>>>>       lxv 32,0(9)
>>>>       sradi 9,6,2
>>>>       addze 9,9
>>>>       sldi 9,9,2
>>>>       subf 9,9,6
>>>>       subfic 9,9,3
>>>>       sldi 9,9,2
>>>>       subfic 9,9,20
>>>>       lvsl 13,0,9
>>>>       xxperm 33,33,45
>>>>       xxperm 32,32,45
>>>>       xxsel 34,34,33,32
>>>
>>> For v a V4SI, x a SI, j some int,  what do we generate for
>>>     v[j&3] = x;
>>> ?
>>> This should be exactly the same as we generate for
>>>     vec_insert(x, v, j);
>>> (the builtin does a modulo 4 automatically).
>>
>> No, even with my patch "stxv 34,-16(1);stwx 5,9,6;lxv 34,-16(1)" generated 
>> currently.
>> Is it feasible and acceptable to expand some kind of pattern in expander 
>> directly without
>> builtin transition?
>>
>> I borrowed some of implementation from vec_extract.  For vec_extract, the 
>> issue also exists:
>>
>>             source:                             gimple:                      
>>        expand:                                        asm:
>> 1) i = vec_extract (v, n);  =>  i = __builtin_vec_ext_v4si (v, n);   => 
>> {r120:SI=unspec[r118:V4SI,r119:DI] 134;...}     => slwi 9,6,2   vextuwrx 
>> 3,9,2
>> 2) i = vec_extract (v, 3);  =>  i = __builtin_vec_ext_v4si (v, 3);   => 
>> {r120:SI=vec_select(r118:V4SI,parallel)...}     =>  li 9,12      vextuwrx 
>> 3,9,2
>> 3) i = v[n%4];   =>  _1 = n & 3;  i = VIEW_CONVERT_EXPR<int[4]>(v)[_1];  =>  
>>       ...        =>     stxv 34,-16(1);addi 9,1,-16; rldic 5,5,2,60; lwax 
>> 3,9,5
>> 4) i = v[3];     =>          i = BIT_FIELD_REF <v, 32, 96>;              =>  
>> {r120:SI=vec_select(r118:V4SI,parallel)...} => li 9,12;   vextuwrx 3,9,2
> 
> Why are 1) and 2) handled differently than 3)/4)?

1) and 2) are calling builtin function vec_extract, it is defined to
 __builtin_vec_extract and will be resolved to ALTIVEC_BUILTIN_VEC_EXTRACT
 by resolve_overloaded_builtin, to generate a call __builtin_vec_ext_v4si
 to be expanded only in RTL. 
3) is access variable v as array type with opcode VIEW_CONVERT_EXPR, I
 guess we should also generate builtin call instead of calling
 convert_vector_to_array_for_subscript to generate VIEW_CONVERT_EXPR
 expression for such kind of usage.
4) is translated to BIT_FIELD_REF with constant bitstart and bitsize,
variable v could also be accessed by register instead of stack, so optabs
could match the rs6000_expand_vector_insert to generate expected instruction
through extract_bit_field.

> 
>> Case 3) also couldn't handle the similar usage, and case 4) doesn't generate 
>> builtin as expected,
>> it just expand to vec_select by coincidence.  So does this mean both 
>> vec_insert and vec_extract
>> and all other similar vector builtins should use IFN as suggested by Richard 
>> Biener, to match the
>> pattern in gimple and expand both constant and variable index in expander?  
>> Will this also be
>> beneficial for other targets except power?  Or we should do that gradually 
>> after this patch
>> approved as it seems another independent issue?  Thanks:)
> 
> If the code generated for 3)/4) isn't optimal you have to figure why
> by tracing the RTL
> expansion code and looking for missing optabs.
> 
> Consider the amount of backend code you need to write if ever using
> those in constexpr
> context ...

It seems too complicated to expand the "i = VIEW_CONVERT_EXPR<int[4]>(v)[_1];"
or "VIEW_CONVERT_EXPR<int[4]>(v1)[_1] = i_6(D);" to
rs6000_expand_vector_insert/rs6000_expand_vector_extract in RTL, as:
1) Vector v is stored to stack with array type; need extra load and store 
operation.
2) Requires amount of code to decompose VIEW_CONVERT_EXPR to extract the vector 
and
index then call rs6000_expand_vector_insert/rs6000_expand_vector_extract.

which means replace following instructions #9~#12 to new instruction sequences:
    1: NOTE_INSN_DELETED
    6: NOTE_INSN_BASIC_BLOCK 2
    2: r119:V4SI=%2:V4SI
    3: r120:DI=%5:DI
    4: r121:DI=%6:DI
    5: NOTE_INSN_FUNCTION_BEG
    8: [r112:DI]=r119:V4SI

    9: r122:DI=r121:DI&0x3
   10: r123:DI=r122:DI<<0x2
   11: r124:DI=r112:DI+r123:DI
   12: [r124:DI]=r120:DI#0

   13: r118:V4SI=[r112:DI]
   17: %2:V4SI=r118:V4SI
   18: use %2:V4SI

=>

    1: NOTE_INSN_DELETED
    6: NOTE_INSN_BASIC_BLOCK 2
    2: r119:V4SI=%2:V4SI
    3: r120:DI=%5:DI
    4: r121:DI=%6:DI
    5: NOTE_INSN_FUNCTION_BEG
    8: [r112:DI]=r119:V4SI

r130:V4SI=[r112:DI]
rs6000_expand_vector_insert (r130,  r121:DI&0x3, r120:DI#0)
[r112:DI]=r130:V4SI

   13: r118:V4SI=[r112:DI]
   17: %2:V4SI=r118:V4SI
   18: use %2:V4SI

so maybe bypass convert_vector_to_array_for_subscript for special circumstance
like "i = v[n%4]" or "v[n&3]=i" to generate vec_extract or vec_insert builtin 
call a relative simpler method?

Xionghu

Reply via email to