On Thu, 5 Jun 2025 at 10:00, Tomasz Kaminski <tkami...@redhat.com> wrote: > > > > On Thu, Jun 5, 2025 at 12:06 AM Jonathan Wakely <jwak...@redhat.com> wrote: >> >> Rename __semaphore_base to __semaphore_impl, because it's not used as a >> base class. Replace the three identical lambda expressions with a named >> class, __semaphore_impl::_Can_acquire, which stores the most recent >> value of the counter as a data member. >> >> Add a new __platform_semaphore_impl class template which uses >> __platform_wait_t for the counter and uses more efficient atomic waits >> for the acquire functions, when __platform_wait is available. For a >> binary semaphore some members are further optimized because we know the >> counter can only be zero or one. >> >> Also add a bare wait flag to __atomic_wait_address_v, for consistency >> with __atomic_wait_address_until_v and __atomic_wait_address_for_v and >> to allow semaphores to use it without the redundant overhead of tracking >> waiters. >> >> libstdc++-v3/ChangeLog: >> >> * include/bits/atomic_wait.h (__atomic_wait_address_v): Add bare >> wait flag. >> * include/bits/semaphore_base.h (__semaphore_base): Rename to >> __semaphore_impl. Replace local variable and predicate lambda >> with _Can_acquire struct. >> (__platform_semaphore_impl): New class template. >> (__semaphore_impl): Remove alias template. >> * include/std/semaphore: >> --- >> >> This replaces the two patches: >> >> [PATCH 3/4] libstdc++: Optimize std::binary_semaphore >> [PATCH 4/4] libstdc++: Optimize std::counting_semaphore for futex path >> >> The problem with those patches was that the "optimized" implementations >> were used if _Max <= numeric_limits<__platform_wait_t>::max(), which is >> always true for many non-futex targets because we do: >> >> # if ATOMIC_LONG_LOCK_FREE == 2 >> using __platform_wait_t = unsigned long; >> # else >> using __platform_wait_t = unsigned int; >> # endif >> >> The earlier patches should have been checking either >> _GLIBCXX_HAVE_PLATFORM_WAIT or the __platform_wait_uses_type variable >> template to see if an efficient (i.e. not condition_variable-based) >> implementation is actually supported. It's not useful to check if the >> semaphore's maximum value fits in the __platform_wait_t type if that >> type doesn't have fast wait/wake operations like a futex. >> >> The new proposal in this patch only selects the optimized >> implementations if they can be implemented using __platform_wait. The >> __semaphore_impl always uses the portable atomic wait API (using an >> accessor function and predicate). As Tomasz suggested, a new class >> template provides the optimized implementation for both >> counting_semaphore and binary_semaphore, differing only in >> _M_get_current and _M_try_acquire. >> >> Tested x86_64-linux and sparc-solaris. > > LGTM, one major suggestion to encapsulate the __pred._M_cur == 0 into named > member > function. > I have also checked that __atomic_wait_address_until takes __pred by > reference, > and does not make copies, so encapsulating __cur there is OK. >> >> >> libstdc++-v3/include/bits/atomic_wait.h | 6 +- >> libstdc++-v3/include/bits/semaphore_base.h | 241 +++++++++++++++------ >> libstdc++-v3/include/std/semaphore | 5 +- >> 3 files changed, 186 insertions(+), 66 deletions(-) >> >> diff --git a/libstdc++-v3/include/bits/atomic_wait.h >> b/libstdc++-v3/include/bits/atomic_wait.h >> index 815726c16ccb..d32fda1c2435 100644 >> --- a/libstdc++-v3/include/bits/atomic_wait.h >> +++ b/libstdc++-v3/include/bits/atomic_wait.h >> @@ -249,12 +249,14 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION >> // C++26 will return __val >> } >> >> + // Wait on __addr while *__addr == __old is true. >> + // Cannot be used for proxy waits. >> inline void >> __atomic_wait_address_v(const __detail::__platform_wait_t* __addr, >> __detail::__platform_wait_t __old, >> - int __order) >> + int __order, bool __bare_wait = false) >> { >> - __detail::__wait_args __args{ __addr, __old, __order }; >> + __detail::__wait_args __args{ __addr, __old, __order, __bare_wait }; >> // C++26 will not ignore the return value here >> __detail::__wait_impl(__addr, __args); >> } >> diff --git a/libstdc++-v3/include/bits/semaphore_base.h >> b/libstdc++-v3/include/bits/semaphore_base.h >> index 3f7a33ccd51a..a1311e8690c7 100644 >> --- a/libstdc++-v3/include/bits/semaphore_base.h >> +++ b/libstdc++-v3/include/bits/semaphore_base.h >> @@ -46,23 +46,20 @@ namespace std _GLIBCXX_VISIBILITY(default) >> { >> _GLIBCXX_BEGIN_NAMESPACE_VERSION >> >> - template<bool _Platform_wait> >> - struct __semaphore_base >> + struct __semaphore_impl >> { >> - using __count_type = __conditional_t<_Platform_wait, >> - __detail::__platform_wait_t, >> - ptrdiff_t>; >> + using __count_type = ptrdiff_t; >> >> static constexpr ptrdiff_t _S_max >> = __gnu_cxx::__int_traits<__count_type>::__max; >> >> constexpr explicit >> - __semaphore_base(__count_type __count) noexcept >> + __semaphore_impl(__count_type __count) noexcept >> : _M_counter(__count) >> { } >> >> - __semaphore_base(const __semaphore_base&) = delete; >> - __semaphore_base& operator=(const __semaphore_base&) = delete; >> + __semaphore_impl(const __semaphore_impl&) = delete; >> + __semaphore_impl& operator=(const __semaphore_impl&) = delete; >> >> // Load the current counter value. >> _GLIBCXX_ALWAYS_INLINE __count_type >> @@ -71,8 +68,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION >> >> // Try to acquire the semaphore (i.e. decrement the counter). >> // Returns false if the current counter is zero, or if another thread >> - // decrements the value first. In the latter case, __cur is set to the >> - // new value. >> + // changes the value first. In the latter case, __cur is set to the new >> + // value. >> _GLIBCXX_ALWAYS_INLINE bool >> _M_do_try_acquire(__count_type& __cur) noexcept >> { >> @@ -85,24 +82,19 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION >> memory_order::relaxed); >> } >> >> + // Keep trying to acquire the semaphore in a loop until it succeeds. >> void >> _M_acquire() noexcept >> { >> - auto const __vfn = [this]{ return _M_get_current(); }; >> - auto __val = __vfn(); >> - auto const __pred = [&__val](__count_type __cur) { >> - if (__cur > 0) >> - { >> - __val = __cur; >> - return true; >> - } >> - return false; >> - }; >> - while (!_M_do_try_acquire(__val)) >> - if (__val == 0) >> + auto __vfn = [this]{ return _M_get_current(); }; >> + _Can_acquire __pred{__vfn()}; >> + while (!_M_do_try_acquire(__pred._M_val)) >> + if (__pred._M_val == 0) >> std::__atomic_wait_address(&_M_counter, __pred, __vfn, true); >> } >> >> + // Try to acquire the semaphore, retrying a small number of times >> + // in case of contention. >> bool >> _M_try_acquire() noexcept >> { >> @@ -116,23 +108,13 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION >> bool >> _M_try_acquire_until(const chrono::time_point<_Clock, _Duration>& >> __atime) noexcept >> { >> - auto const __vfn = [this]{ return _M_get_current(); }; >> - auto __val = __vfn(); >> - auto const __pred = [&__val](__count_type __cur) { >> - if (__cur > 0) >> - { >> - __val = __cur; >> - return true; >> - } >> - return false; >> - }; >> - while (!_M_do_try_acquire(__val)) >> - if (__val == 0) >> - { >> - if (!std::__atomic_wait_address_until(&_M_counter, __pred, >> - __vfn, __atime, true)) >> - return false; // timed out >> - } >> + auto __vfn = [this]{ return _M_get_current(); }; >> + _Can_acquire __pred{__vfn()}; >> + while (!_M_do_try_acquire(__pred._M_val)) >> + if (__pred._M_val == 0) >> + if (!std::__atomic_wait_address_until(&_M_counter, __pred, >> + __vfn, __atime, true)) >> + return false; // timed out >> return true; >> } >> >> @@ -140,23 +122,13 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION >> bool >> _M_try_acquire_for(const chrono::duration<_Rep, _Period>& __rtime) >> noexcept >> { >> - auto const __vfn = [this]{ return _M_get_current(); }; >> - auto __val = __vfn(); >> - auto const __pred = [&__val](__count_type __cur) { >> - if (__cur > 0) >> - { >> - __val = __cur; >> - return true; >> - } >> - return false; >> - }; >> - while (!_M_do_try_acquire(__val)) >> - if (__val == 0) >> - { >> - if (!std::__atomic_wait_address_for(&_M_counter, __pred, >> - __vfn, __rtime, true)) >> - return false; // timed out >> - } >> + auto __vfn = [this]{ return _M_get_current(); }; >> + _Can_acquire __pred{__vfn()}; >> + while (!_M_do_try_acquire(__pred._M_val)) >> + if (__pred._M_val == 0) > > I think we could have some named member funciton here, > like __pred.should_wait(). But not strictly necessary.
I like the suggestion. How about naming the predicate struct differently: struct _Available { __count_type _M_val; // Cache of the last value loaded from _M_counter. // Argument should be the latest value of the counter. // Returns true (and caches the value) if it's non-zero and so can be // decremented to acquire the semaphore. Returns false otherwise. bool operator()(__count_type __cur) noexcept { if (__cur == 0) return false; _M_val = __cur; return true; } // Returns true if the last-known value is non-zero. explicit operator bool() const noexcept { _M_val > 0; } }; And then it would be used like this: void _M_acquire() noexcept { auto __vfn = [this]{ return _M_get_current(); }; _Available __is_available{__vfn()}; while (!_M_do_try_acquire(__is_available._M_val)) if (!__is_available) std::__atomic_wait_address(&_M_counter, __is_available, __vfn, true); } So __is_available(vfn()) tests whether it's available based on the new value returned by vfn(), which is the predicate used by the atomic waiting functions, and (bool)__is_available tests whether it's available based on the cached value, which is what we use to decide whether to call the waiting function. Too cute? I think it makes the logic quite readable.