On Wed, 18 Dec 2024 at 14:14, Tamar Christina <tamar.christ...@arm.com> wrote:
>
> > e791e52ec329277474f3218d8a44cd37ded14ac3..8101d868d0c5f7ac4f97931a
> > > ffcf71d826c88094 100644
> > > > --- a/libstdc++-v3/include/bits/hashtable.h
> > > > +++ b/libstdc++-v3/include/bits/hashtable.h
> > > > @@ -2171,7 +2171,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> > > >           if (this->_M_equals(__k, __code, *__p))
> > > >             return __prev_p;
> > > >
> > > > -         if (!__p->_M_nxt || _M_bucket_index(*__p->_M_next()) != __bkt)
> > > > +         if (__builtin_expect (!__p->_M_nxt || _M_bucket_index(*__p-
> > >_M_next())
> > > != __bkt, 0))
> > > >             break;
> > > >           __prev_p = __p;
> > > >         }
> > > > @@ -2201,7 +2201,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> > > >             if (this->_M_equals_tr(__k, __code, *__p))
> > > >               return __prev_p;
> > > >
> > > > -           if (!__p->_M_nxt || _M_bucket_index(*__p->_M_next()) != 
> > > > __bkt)
> > > > +           if (__builtin_expect (!__p->_M_nxt || _M_bucket_index(*__p-
> > > >_M_next()) != __bkt, 0))
> > > >               break;
> > > >             __prev_p = __p;
> > > >           }
> > > > @@ -2228,7 +2228,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> > > >                pointer_to(const_cast<__node_base&>(_M_before_begin));
> > > >           while (__loc._M_before->_M_nxt)
> > > >             {
> > > > -             if (this->_M_key_equals(__k, *__loc._M_node()))
> > > > +             if (__builtin_expect (this->_M_key_equals(__k, 
> > > > *__loc._M_node()), 1))
> > > >                 return __loc;
> > > >               __loc._M_before = __loc._M_before->_M_nxt;
> > > >             }
> > >
> > > The first two changes make sense to me: we're typically going to loop
> > > more than once when searching for an element, so the break is
> > > unlikely. The third change is a bit more surprising, because although
> > > it's likely we'll find an element *eventually* it is only likely once
> > > out of N iterations, not likely on every iteration. But if the numbers
> > > show it helps, then that just shows my intuition is probably wrong.
> > >
> > > If this is optimized for 'find' is it possible that this change
> > > pessimizes insertion (which also has to use _M_locate to find where to
> > > insert)?
> >
> > You're right there seems to be a uniform slowdown for insertions in small 
> > sized
> > hashtables of ~10 entries.  They're about 10-14% slower with that change.
> >
> > As expected just the ones on _M_find_before_node does not cause any issues.
> >
> > Since the insertions into small hashtables don't run long enough I've also 
> > increased
> > the number of iterations to ~1 million even out the score a bit more but 
> > there's still
> > a sizable 5 regressions.
> >
> > I've kicked off a longer run removing the change on _M_locate and with some
> > more
> > variable size finds/inserts.
> >
> > So far it looks like the additional benefits gotten on _M_locate were
> > mainly on two tests.  I'll look at the perf traces to figure out exactly 
> > why but I'll
> > respin
> > the patch without the _M_locate change and send it tomorrow morning once
> > benchmarks finish.
> >
>
> Hi,
>
> Here's a new patch and new numbers showing the improvements over the previous
> one and total improvement with the two together.  This change seems to be 
> mostly
> beneficial on larger sized hashmaps and otherwise no real losses on Insert or 
> Find.
> (around the region of ~1% which look like noise).
>
> which results in ~0-10% extra on top of the previous patch.
>
> In table form:
>
> +-------------+---------------+-------+--------------------+-------------------+-----------------+
> | benchmark   | Type          | Size  | Inline vs baseline | final vs 
> baseline | final vs inline |
> +-------------+---------------+-------+--------------------+-------------------+-----------------+
> | find many   | uint64_t      | 11253 | -15.67%            | -22.96%          
>  | -8.65%          |
> | find many   | uint64_t      | 11253 | -16.74%            | -23.37%          
>  | -7.96%          |
> | find single | uint64_t      | 345   | -5.88%             | -11.54%          
>  | -6.02%          |
> | find many   | string        | 11253 | -4.50%             | -9.56%           
>  | -5.29%          |
> | find single | uint64_t      | 345   | -4.38%             | -9.41%           
>  | -5.26%          |
> | find single | shared string | 11253 | -6.67%             | -11.00%          
>  | -4.64%          |
> | find single | shared string | 11253 | -4.63%             | -9.03%           
>  | -4.61%          |
> | find single | shared string | 345   | -10.41%            | -14.44%          
>  | -4.50%          |
> | find many   | string        | 11253 | -3.41%             | -7.51%           
>  | -4.24%          |
> | find many   | shared string | 11253 | -2.30%             | -5.72%           
>  | -3.50%          |
> | find many   | string        | 13    | 2.86%              | -0.30%           
>  | -3.07%          |
> | find single | string        | 11253 | 4.47%              | 1.34%            
>  | -3.00%          |
> | find many   | custom string | 11253 | 0.25%              | -2.75%           
>  | -2.99%          |
> | find single | uint64_t      | 345   | 2.99%              | 0.01%            
>  | -2.90%          |
> | find single | shared string | 345   | -11.53%            | -13.67%          
>  | -2.41%          |
> | find single | uint64_t      | 11253 | 0.49%              | -1.59%           
>  | -2.07%          |
> +-------------+---------------+-------+--------------------+-------------------+-----------------+
>
>
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>
> Ok for master?

OK, thanks!


>
> Thanks,
> Tamar
>
> libstdc++-v3/ChangeLog:
>
>         * include/bits/hashtable.h
>         (_M_find_before_node): Make it likely that the map has at least one
>         entry and so we do at least one iteration.
>
> -- inline copy of patch --
>
> diff --git a/libstdc++-v3/include/bits/hashtable.h 
> b/libstdc++-v3/include/bits/hashtable.h
> index 
> e791e52ec329277474f3218d8a44cd37ded14ac3..51437619b44ce8b93be6a1d223828fda5a4a4c45
>  100644
> --- a/libstdc++-v3/include/bits/hashtable.h
> +++ b/libstdc++-v3/include/bits/hashtable.h
> @@ -2171,7 +2171,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>           if (this->_M_equals(__k, __code, *__p))
>             return __prev_p;
>
> -         if (!__p->_M_nxt || _M_bucket_index(*__p->_M_next()) != __bkt)
> +         if (__builtin_expect (!__p->_M_nxt || 
> _M_bucket_index(*__p->_M_next()) != __bkt, 0))
>             break;
>           __prev_p = __p;
>         }
> @@ -2201,7 +2201,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>             if (this->_M_equals_tr(__k, __code, *__p))
>               return __prev_p;
>
> -           if (!__p->_M_nxt || _M_bucket_index(*__p->_M_next()) != __bkt)
> +           if (__builtin_expect (!__p->_M_nxt || 
> _M_bucket_index(*__p->_M_next()) != __bkt, 0))
>               break;
>             __prev_p = __p;
>           }

Reply via email to