On Wed, 24 Feb 2021, Tamar Christina wrote: > Hi Richi, > > This is an updated patch with your suggestion. > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. > > Ok for master?
OK. Thanks, Richard. > Thanks, > Tamar > > gcc/ChangeLog: > > PR tree-optimization/99220 > * tree-vect-slp.c (optimize_load_redistribution_1): Remove > node from cache when it's about to be deleted. > > gcc/testsuite/ChangeLog: > > PR tree-optimization/99220 > * g++.dg/vect/pr99220.cc: New test. > > The 02/24/2021 08:52, Richard Biener wrote: > > On Tue, 23 Feb 2021, Tamar Christina wrote: > > > > > Hi Richi, > > > > > > The attached testcase shows a bug where two nodes end up with the same > > > pointer. > > > During the loop that analyzes all the instances > > > in optimize_load_redistribution_1 we do > > > > > > if (value) > > > { > > > SLP_TREE_REF_COUNT (value)++; > > > SLP_TREE_CHILDREN (root)[i] = value; > > > vect_free_slp_tree (node); > > > } > > > > > > when doing a replacement. When this is done and the refcount for the node > > > reaches 0, the node is removed, which allows the libc to return the > > > pointer > > > again in the next call to new, which it does.. > > > > > > First instance > > > > > > note: node 0x5325f48 (max_nunits=1, refcnt=2) > > > note: op: VEC_PERM_EXPR > > > note: { } > > > note: lane permutation { 0[0] 1[1] 0[2] 1[3] } > > > note: children 0x5325db0 0x5325200 > > > > > > Second instance > > > > > > note: node 0x5325f48 (max_nunits=1, refcnt=1) > > > note: op: VEC_PERM_EXPR > > > note: { } > > > note: lane permutation { 0[0] 1[1] } > > > note: children 0x53255b8 0x5325530 > > > > > > This will end up with the illegal construction of > > > > > > note: node 0x53258e8 (max_nunits=2, refcnt=2) > > > note: op template: slp_patt_57 = .COMPLEX_MUL (_16, _16); > > > note: stmt 0 _16 = _14 - _15; > > > note: stmt 1 _23 = _17 + _22; > > > note: children 0x53257d8 0x5325d28 > > > note: node 0x53257d8 (max_nunits=2, refcnt=3) > > > note: op template: l$b_4 = MEM[(const struct a &)_3].b; > > > note: stmt 0 l$b_4 = MEM[(const struct a &)_3].b; > > > note: stmt 1 l$c_5 = MEM[(const struct a &)_3].c; > > > note: load permutation { 0 1 } > > > note: node 0x5325d28 (max_nunits=2, refcnt=8) > > > note: op template: l$b_4 = MEM[(const struct a &)_3].b; > > > note: stmt 0 l$b_4 = MEM[(const struct a &)_3].b; > > > note: stmt 1 l$c_5 = MEM[(const struct a &)_3].c; > > > note: stmt 2 l$b_4 = MEM[(const struct a &)_3].b; > > > note: stmt 3 l$c_5 = MEM[(const struct a &)_3].c; > > > note: load permutation { 0 1 0 1 } > > > > > > To prevent this my initial thought was to add the temporary VEC_PERM_EXPR > > > nodes > > > to the bst_map cache and increase their refcnt one more. However since > > > bst_map > > > is gated on scalar statements and these nodes have none we can't do that. > > > > > > Instead I realized that load_map is really only a visited list at the top > > > level. > > > So instead of returning the reference, we should return NULL. > > > > > > What this means is that it will no replacement was found at that level. > > > This is > > > fine since these VEC_PERM_EXPR are single use. So while the any other > > > node is > > > an indication to use the cache, VEC_PERM_EXPR are an indication to avoid > > > it. > > > > I don't understand really. Waiting for the other patch to be pushed so > > I can eventually have a look, but see below. > > > > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. > > > > > > Ok for master? > > > > > > Thanks, > > > Tamar > > > > > > gcc/ChangeLog: > > > > > > PR tree-optimization/99220 > > > * tree-vect-slp.c (optimize_load_redistribution_1): Don't use > > > VEC_PERM_EXPR in cache. > > > > > > gcc/testsuite/ChangeLog: > > > > > > PR tree-optimization/99220 > > > * g++.dg/vect/pr99220.cc: New test. > > > > > > --- inline copy of patch -- > > > diff --git a/gcc/testsuite/g++.dg/vect/pr99220.cc > > > b/gcc/testsuite/g++.dg/vect/pr99220.cc > > > new file mode 100755 > > > index > > > 0000000000000000000000000000000000000000..ff3058832b742414202a8ada0a9dafc72c9a54aa > > > --- /dev/null > > > +++ b/gcc/testsuite/g++.dg/vect/pr99220.cc > > > @@ -0,0 +1,29 @@ > > > +/* { dg-do compile } */ > > > +/* { dg-additional-options "-w -O3 -march=armv8.3-a" { target { > > > aarch64*-*-* } } } */ > > > + > > > +class a { > > > + float b; > > > + float c; > > > + > > > +public: > > > + a(float d, float e) : b(d), c(e) {} > > > + a operator+(a d) { return a(b + d.b, c + d.c); } > > > + a operator-(a d) { return a(b - d.b, c - d.c); } > > > + a operator*(a d) { return a(b * b - c * c, b * c + c * d.b); } > > > +}; > > > +long f; > > > +a *g; > > > +class { > > > + a *h; > > > + long i; > > > + a *j; > > > + > > > +public: > > > + void k() { > > > + a l = h[0], m = g[i], n = l * g[1], o = l * j[8]; > > > + g[i] = m + n; > > > + g[i + 1] = m - n; > > > + j[f] = o; > > > + } > > > +} p; > > > +main() { p.k(); } > > > diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c > > > index > > > 605873714a5cafaaf822f61f1f769f96b3876694..e631463be8fc5b2de355e674a9c96665beb9516c > > > 100644 > > > --- a/gcc/tree-vect-slp.c > > > +++ b/gcc/tree-vect-slp.c > > > @@ -2292,7 +2292,12 @@ optimize_load_redistribution_1 > > > (scalar_stmts_to_slp_tree_map_t *bst_map, > > > slp_tree root) > > > { > > > if (slp_tree *leader = load_map->get (root)) > > > - return *leader; > > > + { > > > + if (SLP_TREE_CODE (root) == VEC_PERM_EXPR) > > > + return NULL; > > > > But this will then only optimize the first occurance. Wouldn't it be > > better to increase the refcount at > > > > load_map->put (root, node); > > > > and walk load_map at the end, releasing refs owned by it like we do > > for bst_map? > > > > > + else > > > + return *leader; > > > + } > > > > > > load_map->put (root, NULL); > > > > > > > > > > > > > > > > -- > > Richard Biener <rguent...@suse.de> > > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, > > Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg) > > -- Richard Biener <rguent...@suse.de> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)