On Wed, 24 Feb 2021, Tamar Christina wrote:

> Hi Richi,
> 
> This is an updated patch with your suggestion.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Ok for master?

OK.

Thanks,
Richard.

> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
>       PR tree-optimization/99220
>       * tree-vect-slp.c (optimize_load_redistribution_1): Remove
>       node from cache when it's about to be deleted.
> 
> gcc/testsuite/ChangeLog:
> 
>       PR tree-optimization/99220
>       * g++.dg/vect/pr99220.cc: New test.
> 
> The 02/24/2021 08:52, Richard Biener wrote:
> > On Tue, 23 Feb 2021, Tamar Christina wrote:
> > 
> > > Hi Richi,
> > > 
> > > The attached testcase shows a bug where two nodes end up with the same 
> > > pointer.
> > > During the loop that analyzes all the instances
> > > in optimize_load_redistribution_1 we do
> > > 
> > >       if (value)
> > >         {
> > >           SLP_TREE_REF_COUNT (value)++;
> > >           SLP_TREE_CHILDREN (root)[i] = value;
> > >           vect_free_slp_tree (node);
> > >         }
> > > 
> > > when doing a replacement.  When this is done and the refcount for the node
> > > reaches 0, the node is removed, which allows the libc to return the 
> > > pointer
> > > again in the next call to new, which it does..
> > > 
> > > First instance
> > > 
> > > note:   node 0x5325f48 (max_nunits=1, refcnt=2)
> > > note:   op: VEC_PERM_EXPR
> > > note:           { }
> > > note:           lane permutation { 0[0] 1[1] 0[2] 1[3] }
> > > note:           children 0x5325db0 0x5325200
> > > 
> > > Second instance
> > > 
> > > note:   node 0x5325f48 (max_nunits=1, refcnt=1)
> > > note:   op: VEC_PERM_EXPR
> > > note:           { }
> > > note:           lane permutation { 0[0] 1[1] }
> > > note:           children 0x53255b8 0x5325530
> > > 
> > > This will end up with the illegal construction of
> > > 
> > > note:   node 0x53258e8 (max_nunits=2, refcnt=2)
> > > note:   op template: slp_patt_57 = .COMPLEX_MUL (_16, _16);
> > > note:           stmt 0 _16 = _14 - _15;
> > > note:           stmt 1 _23 = _17 + _22;
> > > note:           children 0x53257d8 0x5325d28
> > > note:   node 0x53257d8 (max_nunits=2, refcnt=3)
> > > note:   op template: l$b_4 = MEM[(const struct a &)_3].b;
> > > note:           stmt 0 l$b_4 = MEM[(const struct a &)_3].b;
> > > note:           stmt 1 l$c_5 = MEM[(const struct a &)_3].c;
> > > note:           load permutation { 0 1 }
> > > note:   node 0x5325d28 (max_nunits=2, refcnt=8)
> > > note:   op template: l$b_4 = MEM[(const struct a &)_3].b;
> > > note:           stmt 0 l$b_4 = MEM[(const struct a &)_3].b;
> > > note:           stmt 1 l$c_5 = MEM[(const struct a &)_3].c;
> > > note:           stmt 2 l$b_4 = MEM[(const struct a &)_3].b;
> > > note:           stmt 3 l$c_5 = MEM[(const struct a &)_3].c;
> > > note:           load permutation { 0 1 0 1 }
> > > 
> > > To prevent this my initial thought was to add the temporary VEC_PERM_EXPR 
> > > nodes
> > > to the bst_map cache and increase their refcnt one more.  However since 
> > > bst_map
> > > is gated on scalar statements and these nodes have none we can't do that.
> > > 
> > > Instead I realized that load_map is really only a visited list at the top 
> > > level.
> > > So instead of returning the reference, we should return NULL.
> > > 
> > > What this means is that it will no replacement was found at that level.  
> > > This is
> > > fine since these VEC_PERM_EXPR are single use.  So while the any other 
> > > node is
> > > an indication to use the cache, VEC_PERM_EXPR are an indication to avoid 
> > > it.
> > 
> > I don't understand really.  Waiting for the other patch to be pushed so
> > I can eventually have a look, but see below.
> > 
> > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> > > 
> > > Ok for master?
> > > 
> > > Thanks,
> > > Tamar
> > > 
> > > gcc/ChangeLog:
> > > 
> > >   PR tree-optimization/99220
> > >   * tree-vect-slp.c (optimize_load_redistribution_1): Don't use
> > >   VEC_PERM_EXPR in cache.
> > > 
> > > gcc/testsuite/ChangeLog:
> > > 
> > >   PR tree-optimization/99220
> > >   * g++.dg/vect/pr99220.cc: New test.
> > > 
> > > --- inline copy of patch -- 
> > > diff --git a/gcc/testsuite/g++.dg/vect/pr99220.cc 
> > > b/gcc/testsuite/g++.dg/vect/pr99220.cc
> > > new file mode 100755
> > > index 
> > > 0000000000000000000000000000000000000000..ff3058832b742414202a8ada0a9dafc72c9a54aa
> > > --- /dev/null
> > > +++ b/gcc/testsuite/g++.dg/vect/pr99220.cc
> > > @@ -0,0 +1,29 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-additional-options "-w -O3 -march=armv8.3-a" { target { 
> > > aarch64*-*-* } } } */
> > > +
> > > +class a {
> > > +  float b;
> > > +  float c;
> > > +
> > > +public:
> > > +  a(float d, float e) : b(d), c(e) {}
> > > +  a operator+(a d) { return a(b + d.b, c + d.c); }
> > > +  a operator-(a d) { return a(b - d.b, c - d.c); }
> > > +  a operator*(a d) { return a(b * b - c * c, b * c + c * d.b); }
> > > +};
> > > +long f;
> > > +a *g;
> > > +class {
> > > +  a *h;
> > > +  long i;
> > > +  a *j;
> > > +
> > > +public:
> > > +  void k() {
> > > +    a l = h[0], m = g[i], n = l * g[1], o = l * j[8];
> > > +    g[i] = m + n;
> > > +    g[i + 1] = m - n;
> > > +    j[f] = o;
> > > +  }
> > > +} p;
> > > +main() { p.k(); }
> > > diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
> > > index 
> > > 605873714a5cafaaf822f61f1f769f96b3876694..e631463be8fc5b2de355e674a9c96665beb9516c
> > >  100644
> > > --- a/gcc/tree-vect-slp.c
> > > +++ b/gcc/tree-vect-slp.c
> > > @@ -2292,7 +2292,12 @@ optimize_load_redistribution_1 
> > > (scalar_stmts_to_slp_tree_map_t *bst_map,
> > >                           slp_tree root)
> > >  {
> > >    if (slp_tree *leader = load_map->get (root))
> > > -    return *leader;
> > > +    {
> > > +      if (SLP_TREE_CODE (root) == VEC_PERM_EXPR)
> > > + return NULL;
> > 
> > But this will then only optimize the first occurance.  Wouldn't it be
> > better to increase the refcount at
> > 
> >       load_map->put (root, node);
> > 
> > and walk load_map at the end, releasing refs owned by it like we do
> > for bst_map?
> > 
> > > +      else
> > > + return *leader;
> > > +    }
> > >  
> > >    load_map->put (root, NULL);
> > >  
> > > 
> > > 
> > > 
> > 
> > -- 
> > Richard Biener <rguent...@suse.de>
> > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> > Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
> 
> 

-- 
Richard Biener <rguent...@suse.de>
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)

Reply via email to