On Tue, Sep 1, 2020 at 10:11 AM luoxhu via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> Hi,
>
> On 2020/9/1 01:04, Segher Boessenkool wrote:
> > Hi!
> >
> > On Mon, Aug 31, 2020 at 04:06:47AM -0500, Xiong Hu Luo wrote:
> >> vec_insert accepts 3 arguments, arg0 is input vector, arg1 is the value
> >> to be insert, arg2 is the place to insert arg1 to arg0.  This patch adds
> >> __builtin_vec_insert_v4si[v4sf,v2di,v2df,v8hi,v16qi] for vec_insert to
> >> not expand too early in gimple stage if arg2 is variable, to avoid generate
> >> store hit load instructions.
> >>
> >> For Power9 V4SI:
> >>      addi 9,1,-16
> >>      rldic 6,6,2,60
> >>      stxv 34,-16(1)
> >>      stwx 5,9,6
> >>      lxv 34,-16(1)
> >> =>
> >>      addis 9,2,.LC0@toc@ha
> >>      addi 9,9,.LC0@toc@l
> >>      mtvsrwz 33,5
> >>      lxv 32,0(9)
> >>      sradi 9,6,2
> >>      addze 9,9
> >>      sldi 9,9,2
> >>      subf 9,9,6
> >>      subfic 9,9,3
> >>      sldi 9,9,2
> >>      subfic 9,9,20
> >>      lvsl 13,0,9
> >>      xxperm 33,33,45
> >>      xxperm 32,32,45
> >>      xxsel 34,34,33,32
> >
> > For v a V4SI, x a SI, j some int,  what do we generate for
> >    v[j&3] = x;
> > ?
> > This should be exactly the same as we generate for
> >    vec_insert(x, v, j);
> > (the builtin does a modulo 4 automatically).
>
> No, even with my patch "stxv 34,-16(1);stwx 5,9,6;lxv 34,-16(1)" generated 
> currently.
> Is it feasible and acceptable to expand some kind of pattern in expander 
> directly without
> builtin transition?
>
> I borrowed some of implementation from vec_extract.  For vec_extract, the 
> issue also exists:
>
>            source:                             gimple:                        
>      expand:                                        asm:
> 1) i = vec_extract (v, n);  =>  i = __builtin_vec_ext_v4si (v, n);   => 
> {r120:SI=unspec[r118:V4SI,r119:DI] 134;...}     => slwi 9,6,2   vextuwrx 3,9,2
> 2) i = vec_extract (v, 3);  =>  i = __builtin_vec_ext_v4si (v, 3);   => 
> {r120:SI=vec_select(r118:V4SI,parallel)...}     =>  li 9,12      vextuwrx 
> 3,9,2
> 3) i = v[n%4];   =>  _1 = n & 3;  i = VIEW_CONVERT_EXPR<int[4]>(v)[_1];  =>   
>      ...        =>     stxv 34,-16(1);addi 9,1,-16; rldic 5,5,2,60; lwax 3,9,5
> 4) i = v[3];     =>          i = BIT_FIELD_REF <v, 32, 96>;              =>  
> {r120:SI=vec_select(r118:V4SI,parallel)...} => li 9,12;   vextuwrx 3,9,2

Why are 1) and 2) handled differently than 3)/4)?

> Case 3) also couldn't handle the similar usage, and case 4) doesn't generate 
> builtin as expected,
> it just expand to vec_select by coincidence.  So does this mean both 
> vec_insert and vec_extract
> and all other similar vector builtins should use IFN as suggested by Richard 
> Biener, to match the
> pattern in gimple and expand both constant and variable index in expander?  
> Will this also be
> beneficial for other targets except power?  Or we should do that gradually 
> after this patch
> approved as it seems another independent issue?  Thanks:)

If the code generated for 3)/4) isn't optimal you have to figure why
by tracing the RTL
expansion code and looking for missing optabs.

Consider the amount of backend code you need to write if ever using
those in constexpr
context ...

Richard.

>
>
> Xionghu

Reply via email to