On Thu, Jun 5, 2025 at 6:44 PM Jonathan Wakely <jwak...@redhat.com> wrote:
> On Thu, 5 Jun 2025 at 15:42, Tomasz Kaminski <tkami...@redhat.com> wrote: > > > > > > > > On Thu, Jun 5, 2025 at 12:41 PM Jonathan Wakely <jwak...@redhat.com> > wrote: > >> > >> On Thu, 5 Jun 2025 at 10:00, Tomasz Kaminski <tkami...@redhat.com> > wrote: > >> > > >> > > >> > > >> > On Thu, Jun 5, 2025 at 12:06 AM Jonathan Wakely <jwak...@redhat.com> > wrote: > >> >> > >> >> Rename __semaphore_base to __semaphore_impl, because it's not used > as a > >> >> base class. Replace the three identical lambda expressions with a > named > >> >> class, __semaphore_impl::_Can_acquire, which stores the most recent > >> >> value of the counter as a data member. > >> >> > >> >> Add a new __platform_semaphore_impl class template which uses > >> >> __platform_wait_t for the counter and uses more efficient atomic > waits > >> >> for the acquire functions, when __platform_wait is available. For a > >> >> binary semaphore some members are further optimized because we know > the > >> >> counter can only be zero or one. > >> >> > >> >> Also add a bare wait flag to __atomic_wait_address_v, for consistency > >> >> with __atomic_wait_address_until_v and __atomic_wait_address_for_v > and > >> >> to allow semaphores to use it without the redundant overhead of > tracking > >> >> waiters. > >> >> > >> >> libstdc++-v3/ChangeLog: > >> >> > >> >> * include/bits/atomic_wait.h (__atomic_wait_address_v): Add > bare > >> >> wait flag. > >> >> * include/bits/semaphore_base.h (__semaphore_base): Rename to > >> >> __semaphore_impl. Replace local variable and predicate lambda > >> >> with _Can_acquire struct. > >> >> (__platform_semaphore_impl): New class template. > >> >> (__semaphore_impl): Remove alias template. > >> >> * include/std/semaphore: > >> >> --- > >> >> > >> >> This replaces the two patches: > >> >> > >> >> [PATCH 3/4] libstdc++: Optimize std::binary_semaphore > >> >> [PATCH 4/4] libstdc++: Optimize std::counting_semaphore for futex > path > >> >> > >> >> The problem with those patches was that the "optimized" > implementations > >> >> were used if _Max <= numeric_limits<__platform_wait_t>::max(), which > is > >> >> always true for many non-futex targets because we do: > >> >> > >> >> # if ATOMIC_LONG_LOCK_FREE == 2 > >> >> using __platform_wait_t = unsigned long; > >> >> # else > >> >> using __platform_wait_t = unsigned int; > >> >> # endif > >> >> > >> >> The earlier patches should have been checking either > >> >> _GLIBCXX_HAVE_PLATFORM_WAIT or the __platform_wait_uses_type variable > >> >> template to see if an efficient (i.e. not condition_variable-based) > >> >> implementation is actually supported. It's not useful to check if the > >> >> semaphore's maximum value fits in the __platform_wait_t type if that > >> >> type doesn't have fast wait/wake operations like a futex. > >> >> > >> >> The new proposal in this patch only selects the optimized > >> >> implementations if they can be implemented using __platform_wait. > The > >> >> __semaphore_impl always uses the portable atomic wait API (using an > >> >> accessor function and predicate). As Tomasz suggested, a new class > >> >> template provides the optimized implementation for both > >> >> counting_semaphore and binary_semaphore, differing only in > >> >> _M_get_current and _M_try_acquire. > >> >> > >> >> Tested x86_64-linux and sparc-solaris. > >> > > >> > LGTM, one major suggestion to encapsulate the __pred._M_cur == 0 into > named member > >> > function. > >> > I have also checked that __atomic_wait_address_until takes __pred by > reference, > >> > and does not make copies, so encapsulating __cur there is OK. > >> >> > >> >> > >> >> libstdc++-v3/include/bits/atomic_wait.h | 6 +- > >> >> libstdc++-v3/include/bits/semaphore_base.h | 241 > +++++++++++++++------ > >> >> libstdc++-v3/include/std/semaphore | 5 +- > >> >> 3 files changed, 186 insertions(+), 66 deletions(-) > >> >> > >> >> diff --git a/libstdc++-v3/include/bits/atomic_wait.h > b/libstdc++-v3/include/bits/atomic_wait.h > >> >> index 815726c16ccb..d32fda1c2435 100644 > >> >> --- a/libstdc++-v3/include/bits/atomic_wait.h > >> >> +++ b/libstdc++-v3/include/bits/atomic_wait.h > >> >> @@ -249,12 +249,14 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION > >> >> // C++26 will return __val > >> >> } > >> >> > >> >> + // Wait on __addr while *__addr == __old is true. > >> >> + // Cannot be used for proxy waits. > >> >> inline void > >> >> __atomic_wait_address_v(const __detail::__platform_wait_t* __addr, > >> >> __detail::__platform_wait_t __old, > >> >> - int __order) > >> >> + int __order, bool __bare_wait = false) > >> >> { > >> >> - __detail::__wait_args __args{ __addr, __old, __order }; > >> >> + __detail::__wait_args __args{ __addr, __old, __order, > __bare_wait }; > >> >> // C++26 will not ignore the return value here > >> >> __detail::__wait_impl(__addr, __args); > >> >> } > >> >> diff --git a/libstdc++-v3/include/bits/semaphore_base.h > b/libstdc++-v3/include/bits/semaphore_base.h > >> >> index 3f7a33ccd51a..a1311e8690c7 100644 > >> >> --- a/libstdc++-v3/include/bits/semaphore_base.h > >> >> +++ b/libstdc++-v3/include/bits/semaphore_base.h > >> >> @@ -46,23 +46,20 @@ namespace std _GLIBCXX_VISIBILITY(default) > >> >> { > >> >> _GLIBCXX_BEGIN_NAMESPACE_VERSION > >> >> > >> >> - template<bool _Platform_wait> > >> >> - struct __semaphore_base > >> >> + struct __semaphore_impl > >> >> { > >> >> - using __count_type = __conditional_t<_Platform_wait, > >> >> - __detail::__platform_wait_t, > >> >> - ptrdiff_t>; > >> >> + using __count_type = ptrdiff_t; > >> >> > >> >> static constexpr ptrdiff_t _S_max > >> >> = __gnu_cxx::__int_traits<__count_type>::__max; > >> >> > >> >> constexpr explicit > >> >> - __semaphore_base(__count_type __count) noexcept > >> >> + __semaphore_impl(__count_type __count) noexcept > >> >> : _M_counter(__count) > >> >> { } > >> >> > >> >> - __semaphore_base(const __semaphore_base&) = delete; > >> >> - __semaphore_base& operator=(const __semaphore_base&) = delete; > >> >> + __semaphore_impl(const __semaphore_impl&) = delete; > >> >> + __semaphore_impl& operator=(const __semaphore_impl&) = delete; > >> >> > >> >> // Load the current counter value. > >> >> _GLIBCXX_ALWAYS_INLINE __count_type > >> >> @@ -71,8 +68,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION > >> >> > >> >> // Try to acquire the semaphore (i.e. decrement the counter). > >> >> // Returns false if the current counter is zero, or if another > thread > >> >> - // decrements the value first. In the latter case, __cur is set > to the > >> >> - // new value. > >> >> + // changes the value first. In the latter case, __cur is set to > the new > >> >> + // value. > >> >> _GLIBCXX_ALWAYS_INLINE bool > >> >> _M_do_try_acquire(__count_type& __cur) noexcept > >> >> { > >> >> @@ -85,24 +82,19 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION > >> >> > memory_order::relaxed); > >> >> } > >> >> > >> >> + // Keep trying to acquire the semaphore in a loop until it > succeeds. > >> >> void > >> >> _M_acquire() noexcept > >> >> { > >> >> - auto const __vfn = [this]{ return _M_get_current(); }; > >> >> - auto __val = __vfn(); > >> >> - auto const __pred = [&__val](__count_type __cur) { > >> >> - if (__cur > 0) > >> >> - { > >> >> - __val = __cur; > >> >> - return true; > >> >> - } > >> >> - return false; > >> >> - }; > >> >> - while (!_M_do_try_acquire(__val)) > >> >> - if (__val == 0) > >> >> + auto __vfn = [this]{ return _M_get_current(); }; > >> >> + _Can_acquire __pred{__vfn()}; > >> >> + while (!_M_do_try_acquire(__pred._M_val)) > >> >> + if (__pred._M_val == 0) > >> >> std::__atomic_wait_address(&_M_counter, __pred, __vfn, > true); > >> >> } > >> >> > >> >> + // Try to acquire the semaphore, retrying a small number of > times > >> >> + // in case of contention. > >> >> bool > >> >> _M_try_acquire() noexcept > >> >> { > >> >> @@ -116,23 +108,13 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION > >> >> bool > >> >> _M_try_acquire_until(const chrono::time_point<_Clock, > _Duration>& __atime) noexcept > >> >> { > >> >> - auto const __vfn = [this]{ return _M_get_current(); }; > >> >> - auto __val = __vfn(); > >> >> - auto const __pred = [&__val](__count_type __cur) { > >> >> - if (__cur > 0) > >> >> - { > >> >> - __val = __cur; > >> >> - return true; > >> >> - } > >> >> - return false; > >> >> - }; > >> >> - while (!_M_do_try_acquire(__val)) > >> >> - if (__val == 0) > >> >> - { > >> >> - if (!std::__atomic_wait_address_until(&_M_counter, > __pred, > >> >> - __vfn, __atime, > true)) > >> >> - return false; // timed out > >> >> - } > >> >> + auto __vfn = [this]{ return _M_get_current(); }; > >> >> + _Can_acquire __pred{__vfn()}; > >> >> + while (!_M_do_try_acquire(__pred._M_val)) > >> >> + if (__pred._M_val == 0) > >> >> + if (!std::__atomic_wait_address_until(&_M_counter, > __pred, > >> >> + __vfn, __atime, > true)) > >> >> + return false; // timed out > >> >> return true; > >> >> } > >> >> > >> >> @@ -140,23 +122,13 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION > >> >> bool > >> >> _M_try_acquire_for(const chrono::duration<_Rep, _Period>& > __rtime) noexcept > >> >> { > >> >> - auto const __vfn = [this]{ return _M_get_current(); }; > >> >> - auto __val = __vfn(); > >> >> - auto const __pred = [&__val](__count_type __cur) { > >> >> - if (__cur > 0) > >> >> - { > >> >> - __val = __cur; > >> >> - return true; > >> >> - } > >> >> - return false; > >> >> - }; > >> >> - while (!_M_do_try_acquire(__val)) > >> >> - if (__val == 0) > >> >> - { > >> >> - if (!std::__atomic_wait_address_for(&_M_counter, > __pred, > >> >> - __vfn, __rtime, > true)) > >> >> - return false; // timed out > >> >> - } > >> >> + auto __vfn = [this]{ return _M_get_current(); }; > >> >> + _Can_acquire __pred{__vfn()}; > >> >> + while (!_M_do_try_acquire(__pred._M_val)) > >> >> + if (__pred._M_val == 0) > >> > > >> > I think we could have some named member funciton here, > >> > like __pred.should_wait(). But not strictly necessary. > >> > >> I like the suggestion. How about naming the predicate struct > differently: > >> > >> struct _Available > >> { > >> __count_type _M_val; // Cache of the last value loaded from > _M_counter. > >> > >> // Argument should be the latest value of the counter. > >> // Returns true (and caches the value) if it's non-zero and so > can be > >> // decremented to acquire the semaphore. Returns false otherwise. > >> bool operator()(__count_type __cur) noexcept > >> { > >> if (__cur == 0) > >> return false; > >> _M_val = __cur; > >> return true; > >> } > >> > >> // Returns true if the last-known value is non-zero. > >> explicit operator bool() const noexcept { _M_val > 0; } > >> }; > >> > >> And then it would be used like this: > >> > >> void > >> _M_acquire() noexcept > >> { > >> auto __vfn = [this]{ return _M_get_current(); }; > >> _Available __is_available{__vfn()}; > >> while (!_M_do_try_acquire(__is_available._M_val)) > >> if (!__is_available) > > > > I think the bool conversion here is too cute, the two uses look > "simialar". > > I don't see a problem with that, they *are* similar. One tests the > last known value, the other updates the last-known value and then > tests it. > > The operator() function could be implemented as: > > bool operator()(count_type cur) > { > _M_val = cur; > return static_cast<bool>(*this); > } > > That would be equivalent, but it's implemented slightly differently to > avoid a redundant store (don't bother to store zero, because it must > already be zero if the predicate was being tested by the waiting > function). > > I did consider naming the operator bool() as operator()() instead, so > the callable could be used with one argument (to update the cache and > test it) and with no arguments (to test the previously cached value). > I think the root of my concern was that, if is_available is used as a variable, then it looks like we are calling __atomic_wait_address overload that accepts a value. Having operator()(), and using it uniformly as a functor/predicate alleviates this. > > Then it would be used like this: > > void > _M_acquire() noexcept > { > auto __vfn = [this]{ return _M_get_current(); }; > _Available __is_available{__vfn()}; > while (!_M_do_try_acquire(__is_available._M_val)) > if (!__is_available()) > std::__atomic_wait_address(&_M_counter, __is_available, __vfn, > true); > } > > > Maye rename use a named member like "current()" or "last()", or > "is_available.currently()". > >> > >> std::__atomic_wait_address(&_M_counter, __is_available, > __vfn, true); > >> } > >> > >> So __is_available(vfn()) tests whether it's available based on the new > >> value returned by vfn(), which is the predicate used by the atomic > >> waiting functions, and (bool)__is_available tests whether it's > >> available based on the cached value, which is what we use to decide > >> whether to call the waiting function. > >> > >> Too cute? I think it makes the logic quite readable. > >> > >