On Thu, 5 Jun 2025 at 18:58, Tomasz Kaminski <tkami...@redhat.com> wrote: > > > > On Thu, Jun 5, 2025 at 6:44 PM Jonathan Wakely <jwak...@redhat.com> wrote: >> >> On Thu, 5 Jun 2025 at 15:42, Tomasz Kaminski <tkami...@redhat.com> wrote: >> > >> > >> > >> > On Thu, Jun 5, 2025 at 12:41 PM Jonathan Wakely <jwak...@redhat.com> wrote: >> >> >> >> On Thu, 5 Jun 2025 at 10:00, Tomasz Kaminski <tkami...@redhat.com> wrote: >> >> > >> >> > >> >> > >> >> > On Thu, Jun 5, 2025 at 12:06 AM Jonathan Wakely <jwak...@redhat.com> >> >> > wrote: >> >> >> >> >> >> Rename __semaphore_base to __semaphore_impl, because it's not used as a >> >> >> base class. Replace the three identical lambda expressions with a named >> >> >> class, __semaphore_impl::_Can_acquire, which stores the most recent >> >> >> value of the counter as a data member. >> >> >> >> >> >> Add a new __platform_semaphore_impl class template which uses >> >> >> __platform_wait_t for the counter and uses more efficient atomic waits >> >> >> for the acquire functions, when __platform_wait is available. For a >> >> >> binary semaphore some members are further optimized because we know the >> >> >> counter can only be zero or one. >> >> >> >> >> >> Also add a bare wait flag to __atomic_wait_address_v, for consistency >> >> >> with __atomic_wait_address_until_v and __atomic_wait_address_for_v and >> >> >> to allow semaphores to use it without the redundant overhead of >> >> >> tracking >> >> >> waiters. >> >> >> >> >> >> libstdc++-v3/ChangeLog: >> >> >> >> >> >> * include/bits/atomic_wait.h (__atomic_wait_address_v): Add >> >> >> bare >> >> >> wait flag. >> >> >> * include/bits/semaphore_base.h (__semaphore_base): Rename to >> >> >> __semaphore_impl. Replace local variable and predicate lambda >> >> >> with _Can_acquire struct. >> >> >> (__platform_semaphore_impl): New class template. >> >> >> (__semaphore_impl): Remove alias template. >> >> >> * include/std/semaphore: >> >> >> --- >> >> >> >> >> >> This replaces the two patches: >> >> >> >> >> >> [PATCH 3/4] libstdc++: Optimize std::binary_semaphore >> >> >> [PATCH 4/4] libstdc++: Optimize std::counting_semaphore for futex path >> >> >> >> >> >> The problem with those patches was that the "optimized" implementations >> >> >> were used if _Max <= numeric_limits<__platform_wait_t>::max(), which is >> >> >> always true for many non-futex targets because we do: >> >> >> >> >> >> # if ATOMIC_LONG_LOCK_FREE == 2 >> >> >> using __platform_wait_t = unsigned long; >> >> >> # else >> >> >> using __platform_wait_t = unsigned int; >> >> >> # endif >> >> >> >> >> >> The earlier patches should have been checking either >> >> >> _GLIBCXX_HAVE_PLATFORM_WAIT or the __platform_wait_uses_type variable >> >> >> template to see if an efficient (i.e. not condition_variable-based) >> >> >> implementation is actually supported. It's not useful to check if the >> >> >> semaphore's maximum value fits in the __platform_wait_t type if that >> >> >> type doesn't have fast wait/wake operations like a futex. >> >> >> >> >> >> The new proposal in this patch only selects the optimized >> >> >> implementations if they can be implemented using __platform_wait. The >> >> >> __semaphore_impl always uses the portable atomic wait API (using an >> >> >> accessor function and predicate). As Tomasz suggested, a new class >> >> >> template provides the optimized implementation for both >> >> >> counting_semaphore and binary_semaphore, differing only in >> >> >> _M_get_current and _M_try_acquire. >> >> >> >> >> >> Tested x86_64-linux and sparc-solaris. >> >> > >> >> > LGTM, one major suggestion to encapsulate the __pred._M_cur == 0 into >> >> > named member >> >> > function. >> >> > I have also checked that __atomic_wait_address_until takes __pred by >> >> > reference, >> >> > and does not make copies, so encapsulating __cur there is OK. >> >> >> >> >> >> >> >> >> libstdc++-v3/include/bits/atomic_wait.h | 6 +- >> >> >> libstdc++-v3/include/bits/semaphore_base.h | 241 +++++++++++++++------ >> >> >> libstdc++-v3/include/std/semaphore | 5 +- >> >> >> 3 files changed, 186 insertions(+), 66 deletions(-) >> >> >> >> >> >> diff --git a/libstdc++-v3/include/bits/atomic_wait.h >> >> >> b/libstdc++-v3/include/bits/atomic_wait.h >> >> >> index 815726c16ccb..d32fda1c2435 100644 >> >> >> --- a/libstdc++-v3/include/bits/atomic_wait.h >> >> >> +++ b/libstdc++-v3/include/bits/atomic_wait.h >> >> >> @@ -249,12 +249,14 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION >> >> >> // C++26 will return __val >> >> >> } >> >> >> >> >> >> + // Wait on __addr while *__addr == __old is true. >> >> >> + // Cannot be used for proxy waits. >> >> >> inline void >> >> >> __atomic_wait_address_v(const __detail::__platform_wait_t* __addr, >> >> >> __detail::__platform_wait_t __old, >> >> >> - int __order) >> >> >> + int __order, bool __bare_wait = false) >> >> >> { >> >> >> - __detail::__wait_args __args{ __addr, __old, __order }; >> >> >> + __detail::__wait_args __args{ __addr, __old, __order, __bare_wait >> >> >> }; >> >> >> // C++26 will not ignore the return value here >> >> >> __detail::__wait_impl(__addr, __args); >> >> >> } >> >> >> diff --git a/libstdc++-v3/include/bits/semaphore_base.h >> >> >> b/libstdc++-v3/include/bits/semaphore_base.h >> >> >> index 3f7a33ccd51a..a1311e8690c7 100644 >> >> >> --- a/libstdc++-v3/include/bits/semaphore_base.h >> >> >> +++ b/libstdc++-v3/include/bits/semaphore_base.h >> >> >> @@ -46,23 +46,20 @@ namespace std _GLIBCXX_VISIBILITY(default) >> >> >> { >> >> >> _GLIBCXX_BEGIN_NAMESPACE_VERSION >> >> >> >> >> >> - template<bool _Platform_wait> >> >> >> - struct __semaphore_base >> >> >> + struct __semaphore_impl >> >> >> { >> >> >> - using __count_type = __conditional_t<_Platform_wait, >> >> >> - __detail::__platform_wait_t, >> >> >> - ptrdiff_t>; >> >> >> + using __count_type = ptrdiff_t; >> >> >> >> >> >> static constexpr ptrdiff_t _S_max >> >> >> = __gnu_cxx::__int_traits<__count_type>::__max; >> >> >> >> >> >> constexpr explicit >> >> >> - __semaphore_base(__count_type __count) noexcept >> >> >> + __semaphore_impl(__count_type __count) noexcept >> >> >> : _M_counter(__count) >> >> >> { } >> >> >> >> >> >> - __semaphore_base(const __semaphore_base&) = delete; >> >> >> - __semaphore_base& operator=(const __semaphore_base&) = delete; >> >> >> + __semaphore_impl(const __semaphore_impl&) = delete; >> >> >> + __semaphore_impl& operator=(const __semaphore_impl&) = delete; >> >> >> >> >> >> // Load the current counter value. >> >> >> _GLIBCXX_ALWAYS_INLINE __count_type >> >> >> @@ -71,8 +68,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION >> >> >> >> >> >> // Try to acquire the semaphore (i.e. decrement the counter). >> >> >> // Returns false if the current counter is zero, or if another >> >> >> thread >> >> >> - // decrements the value first. In the latter case, __cur is set >> >> >> to the >> >> >> - // new value. >> >> >> + // changes the value first. In the latter case, __cur is set to >> >> >> the new >> >> >> + // value. >> >> >> _GLIBCXX_ALWAYS_INLINE bool >> >> >> _M_do_try_acquire(__count_type& __cur) noexcept >> >> >> { >> >> >> @@ -85,24 +82,19 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION >> >> >> >> >> >> memory_order::relaxed); >> >> >> } >> >> >> >> >> >> + // Keep trying to acquire the semaphore in a loop until it >> >> >> succeeds. >> >> >> void >> >> >> _M_acquire() noexcept >> >> >> { >> >> >> - auto const __vfn = [this]{ return _M_get_current(); }; >> >> >> - auto __val = __vfn(); >> >> >> - auto const __pred = [&__val](__count_type __cur) { >> >> >> - if (__cur > 0) >> >> >> - { >> >> >> - __val = __cur; >> >> >> - return true; >> >> >> - } >> >> >> - return false; >> >> >> - }; >> >> >> - while (!_M_do_try_acquire(__val)) >> >> >> - if (__val == 0) >> >> >> + auto __vfn = [this]{ return _M_get_current(); }; >> >> >> + _Can_acquire __pred{__vfn()}; >> >> >> + while (!_M_do_try_acquire(__pred._M_val)) >> >> >> + if (__pred._M_val == 0) >> >> >> std::__atomic_wait_address(&_M_counter, __pred, __vfn, true); >> >> >> } >> >> >> >> >> >> + // Try to acquire the semaphore, retrying a small number of times >> >> >> + // in case of contention. >> >> >> bool >> >> >> _M_try_acquire() noexcept >> >> >> { >> >> >> @@ -116,23 +108,13 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION >> >> >> bool >> >> >> _M_try_acquire_until(const chrono::time_point<_Clock, >> >> >> _Duration>& __atime) noexcept >> >> >> { >> >> >> - auto const __vfn = [this]{ return _M_get_current(); }; >> >> >> - auto __val = __vfn(); >> >> >> - auto const __pred = [&__val](__count_type __cur) { >> >> >> - if (__cur > 0) >> >> >> - { >> >> >> - __val = __cur; >> >> >> - return true; >> >> >> - } >> >> >> - return false; >> >> >> - }; >> >> >> - while (!_M_do_try_acquire(__val)) >> >> >> - if (__val == 0) >> >> >> - { >> >> >> - if (!std::__atomic_wait_address_until(&_M_counter, >> >> >> __pred, >> >> >> - __vfn, __atime, >> >> >> true)) >> >> >> - return false; // timed out >> >> >> - } >> >> >> + auto __vfn = [this]{ return _M_get_current(); }; >> >> >> + _Can_acquire __pred{__vfn()}; >> >> >> + while (!_M_do_try_acquire(__pred._M_val)) >> >> >> + if (__pred._M_val == 0) >> >> >> + if (!std::__atomic_wait_address_until(&_M_counter, __pred, >> >> >> + __vfn, __atime, >> >> >> true)) >> >> >> + return false; // timed out >> >> >> return true; >> >> >> } >> >> >> >> >> >> @@ -140,23 +122,13 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION >> >> >> bool >> >> >> _M_try_acquire_for(const chrono::duration<_Rep, _Period>& >> >> >> __rtime) noexcept >> >> >> { >> >> >> - auto const __vfn = [this]{ return _M_get_current(); }; >> >> >> - auto __val = __vfn(); >> >> >> - auto const __pred = [&__val](__count_type __cur) { >> >> >> - if (__cur > 0) >> >> >> - { >> >> >> - __val = __cur; >> >> >> - return true; >> >> >> - } >> >> >> - return false; >> >> >> - }; >> >> >> - while (!_M_do_try_acquire(__val)) >> >> >> - if (__val == 0) >> >> >> - { >> >> >> - if (!std::__atomic_wait_address_for(&_M_counter, __pred, >> >> >> - __vfn, __rtime, >> >> >> true)) >> >> >> - return false; // timed out >> >> >> - } >> >> >> + auto __vfn = [this]{ return _M_get_current(); }; >> >> >> + _Can_acquire __pred{__vfn()}; >> >> >> + while (!_M_do_try_acquire(__pred._M_val)) >> >> >> + if (__pred._M_val == 0) >> >> > >> >> > I think we could have some named member funciton here, >> >> > like __pred.should_wait(). But not strictly necessary. >> >> >> >> I like the suggestion. How about naming the predicate struct differently: >> >> >> >> struct _Available >> >> { >> >> __count_type _M_val; // Cache of the last value loaded from >> >> _M_counter. >> >> >> >> // Argument should be the latest value of the counter. >> >> // Returns true (and caches the value) if it's non-zero and so can >> >> be >> >> // decremented to acquire the semaphore. Returns false otherwise. >> >> bool operator()(__count_type __cur) noexcept >> >> { >> >> if (__cur == 0) >> >> return false; >> >> _M_val = __cur; >> >> return true; >> >> } >> >> >> >> // Returns true if the last-known value is non-zero. >> >> explicit operator bool() const noexcept { _M_val > 0; } >> >> }; >> >> >> >> And then it would be used like this: >> >> >> >> void >> >> _M_acquire() noexcept >> >> { >> >> auto __vfn = [this]{ return _M_get_current(); }; >> >> _Available __is_available{__vfn()}; >> >> while (!_M_do_try_acquire(__is_available._M_val)) >> >> if (!__is_available) >> > >> > I think the bool conversion here is too cute, the two uses look "simialar". >> >> I don't see a problem with that, they *are* similar. One tests the >> last known value, the other updates the last-known value and then >> tests it. >> >> The operator() function could be implemented as: >> >> bool operator()(count_type cur) >> { >> _M_val = cur; >> return static_cast<bool>(*this); >> } >> >> That would be equivalent, but it's implemented slightly differently to >> avoid a redundant store (don't bother to store zero, because it must >> already be zero if the predicate was being tested by the waiting >> function). >> >> I did consider naming the operator bool() as operator()() instead, so >> the callable could be used with one argument (to update the cache and >> test it) and with no arguments (to test the previously cached value). > > I think the root of my concern was that, if is_available is used as a > variable, then it looks like > we are calling __atomic_wait_address overload that accepts a value.
Ah yes, I see. > Having operator()(), > and using it uniformly as a functor/predicate alleviates this. OK great. One last question then: >> Then it would be used like this: >> >> void >> _M_acquire() noexcept >> { >> auto __vfn = [this]{ return _M_get_current(); }; >> _Available __is_available{__vfn()}; Would this initialization be better if written like this: _Available __is_available{ ._M_val = __vfn() }; I think it makes the use of _M_val on the next line a bit clearer: >> while (!_M_do_try_acquire(__is_available._M_val)) Although maybe it's not a big deal, because the definition of _Available and its _M_val member are only a few lines further down in the same file, not far away in some other header. >> if (!__is_available()) >> std::__atomic_wait_address(&_M_counter, __is_available, __vfn, >> true); >> } >> >> > Maye rename use a named member like "current()" or "last()", or >> > "is_available.currently()". >> >> >> >> std::__atomic_wait_address(&_M_counter, __is_available, __vfn, >> >> true); >> >> } >> >> >> >> So __is_available(vfn()) tests whether it's available based on the new >> >> value returned by vfn(), which is the predicate used by the atomic >> >> waiting functions, and (bool)__is_available tests whether it's >> >> available based on the cached value, which is what we use to decide >> >> whether to call the waiting function. >> >> >> >> Too cute? I think it makes the logic quite readable. >> >> >>