On Fri, 23 Jun 2023 at 14:58, Richard Biener <richard.guent...@gmail.com> wrote:
>
> On Fri, Jun 23, 2023 at 11:09 AM Prathamesh Kulkarni
> <prathamesh.kulka...@linaro.org> wrote:
> >
> > On Thu, 22 Jun 2023 at 18:06, Richard Biener <richard.guent...@gmail.com> 
> > wrote:
> > >
> > > On Thu, Jun 22, 2023 at 11:08 AM Prathamesh Kulkarni
> > > <prathamesh.kulka...@linaro.org> wrote:
> > > >
> > > > On Tue, 20 Jun 2023 at 16:47, Richard Biener 
> > > > <richard.guent...@gmail.com> wrote:
> > > > >
> > > > > On Tue, Jun 20, 2023 at 11:56 AM Prathamesh Kulkarni via Gcc-patches
> > > > > <gcc-patches@gcc.gnu.org> wrote:
> > > > > >
> > > > > > Hi Richard,
> > > > > > For the following reduced test-case taken from PR:
> > > > > >
> > > > > > #include "arm_sve.h"
> > > > > > svuint32_t l() {
> > > > > >   alignas(16) const unsigned int lanes[4] = {0, 0, 0, 0};
> > > > > >   return svld1rq_u32(svptrue_b8(), lanes);
> > > > > > }
> > > > > >
> > > > > > compiling with -O3 -mcpu=generic+sve results in following ICE:
> > > > > > during GIMPLE pass: fre
> > > > > > pr110280.c: In function 'l':
> > > > > > pr110280.c:5:1: internal compiler error: in eliminate_stmt, at
> > > > > > tree-ssa-sccvn.cc:6890
> > > > > >     5 | }
> > > > > >       | ^
> > > > > > 0x865fb1 eliminate_dom_walker::eliminate_stmt(basic_block_def*,
> > > > > > gimple_stmt_iterator*)
> > > > > >         ../../gcc/gcc/tree-ssa-sccvn.cc:6890
> > > > > > 0x120bf4d 
> > > > > > eliminate_dom_walker::before_dom_children(basic_block_def*)
> > > > > >         ../../gcc/gcc/tree-ssa-sccvn.cc:7324
> > > > > > 0x120bf4d 
> > > > > > eliminate_dom_walker::before_dom_children(basic_block_def*)
> > > > > >         ../../gcc/gcc/tree-ssa-sccvn.cc:7257
> > > > > > 0x1aeec77 dom_walker::walk(basic_block_def*)
> > > > > >         ../../gcc/gcc/domwalk.cc:311
> > > > > > 0x11fd924 eliminate_with_rpo_vn(bitmap_head*)
> > > > > >         ../../gcc/gcc/tree-ssa-sccvn.cc:7504
> > > > > > 0x1214664 do_rpo_vn_1
> > > > > >         ../../gcc/gcc/tree-ssa-sccvn.cc:8616
> > > > > > 0x1215ba5 execute
> > > > > >         ../../gcc/gcc/tree-ssa-sccvn.cc:8702
> > > > > >
> > > > > > cc1 simplifies:
> > > > > >   lanes[0] = 0;
> > > > > >   lanes[1] = 0;
> > > > > >   lanes[2] = 0;
> > > > > >   lanes[3] = 0;
> > > > > >   _1 = { -1, ... };
> > > > > >   _7 = svld1rq_u32 (_1, &lanes);
> > > > > >
> > > > > > to:
> > > > > >   _9 = MEM <vector(4) unsigned int> [(unsigned int * 
> > > > > > {ref-all})&lanes];
> > > > > >   _7 = VEC_PERM_EXPR <_9, _9, { 0, 1, 2, 3, ... }>;
> > > > > >
> > > > > > and then fre1 dump shows:
> > > > > > Applying pattern match.pd:8675, generic-match-5.cc:9025
> > > > > > Match-and-simplified VEC_PERM_EXPR <_9, _9, { 0, 1, 2, 3, ... }> to 
> > > > > > {
> > > > > > 0, 0, 0, 0 }
> > > > > > RHS VEC_PERM_EXPR <_9, _9, { 0, 1, 2, 3, ... }> simplified to { 0, 
> > > > > > 0, 0, 0 }
> > > > > >
> > > > > > The issue seems to be with the following pattern:
> > > > > > (simplify
> > > > > >  (vec_perm vec_same_elem_p@0 @0 @1)
> > > > > >  @0)
> > > > > >
> > > > > > which simplifies above VEC_PERM_EXPR to:
> > > > > > _7 = {0, 0, 0, 0}
> > > > > > which is incorrect since _9 and mask have different vector lengths.
> > > > > >
> > > > > > The attached patch amends the pattern to simplify above 
> > > > > > VEC_PERM_EXPR
> > > > > > only if operand and mask have same number of elements, which seems 
> > > > > > to fix
> > > > > > the issue, and we're left with the following in .optimized dump:
> > > > > >   <bb 2> [local count: 1073741824]:
> > > > > >   _2 = VEC_PERM_EXPR <{ 0, 0, 0, 0 }, { 0, 0, 0, 0 }, { 0, 1, 2, 3, 
> > > > > > ... }>;
> > > > >
> > > > > it would be nice to have this optimized.
> > > > >
> > > > > -
> > > > >  (simplify
> > > > >   (vec_perm vec_same_elem_p@0 @0 @1)
> > > > > - @0)
> > > > > + (if (known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (@0)),
> > > > > +               TYPE_VECTOR_SUBPARTS (TREE_TYPE (@1))))
> > > > > +  @0))
> > > > >
> > > > > that looks good I think.  Maybe even better use 'type' instead of 
> > > > > TREE_TYPE (@1)
> > > > > since that's more obviously the return type in which case
> > > > >
> > > > >   (if (types_match (type, TREE_TYPE (@0))
> > > > >
> > > > > would be more to the point.
> > > > >
> > > > > But can't you to simplify this in the !known_eq case do a simple
> > > > >
> > > > >   { build_vector_from_val (type, the-element); }
> > > > >
> > > > > ?  The 'vec_same_elem_p' predicate doesn't get you at the element,
> > > > >
> > > > >  (with { tree el = uniform_vector_p (@0); }
> > > > >   (if (el)
> > > > >    { build_vector_from_val (type, el); })))
> > > > >
> > > > > would be the cheapest workaround.
> > > > Hi Richard,
> > > > Thanks for the suggestions. Using build_vector_from_val simplifies it 
> > > > to:
> > > >   <bb 2> [local count: 1073741824]:
> > > >   return { 0, ... };
> > > >
> > > > Patch is bootstrapped+tested on aarch64-linux-gnu, in progress on
> > > > x86_64-linux-gnu.
> > > > OK to commit ?
> > >
> > > Can you retain the case of matching type?  Like
> > >
> > >   (if (types_match (type, TREE_TYPE (@0))
> > >    @0
> > >    (with
> > >     {
> > >        tree elem = uniform_vector_p (@0);
> > >     }
> > >    (if (elem)
> > >     { build_vector_from_val (type, elem); }))))
> > >
> > > ?  Because uniform_vector_p is strictly less powerful than 
> > > (vec_same_elem_p ...)
> > >
> > > OK with that change.
> > Thanks, does the attached patch look OK ?
>
> OK.
Thanks, pushed to trunk in 85d8e0d8d5342ec8b4e6a54e22741c30b33c6f04.

Thanks,
Prathamesh
>
> > Bootstrapped+tested on aarch64-linux-gnu and x86_64-linux-gnu.
> >
> > Thanks,
> > Prathamesh
> > >
> > > Richard.
> > >
> > >
> > > >
> > > > Thanks,
> > > > Prathamesh
> > > > >
> > > > > >   return _2;
> > > > > >
> > > > > > code-gen:
> > > > > > l:
> > > > > >         mov     z0.b, #0
> > > > > >         ret
> > > > > >
> > > > > > Patch is bootstrapped+tested on aarch64-linux-gnu.
> > > > > > OK to commit ?
> > > > > >
> > > > > > Thanks,
> > > > > > Prathamesh

Reply via email to