Re: [PATCH] libstdc++: Future-proof C++20 atomic wait/notify

Tomasz Kaminski Mon, 17 Nov 2025 04:34:04 -0800

On Mon, Nov 17, 2025 at 10:10 AM Tomasz Kaminski <[email protected]>
wrote:


>
>
> On Sun, Nov 16, 2025 at 1:56 AM Jonathan Wakely <[email protected]>
> wrote:
>
>> This will allow us to extend atomic waiting functions to support a
>> possible future 64-bit version of futex, as well as supporting
>> futex-like wait/wake primitives on other targets (e.g. macOS has
>> os_sync_wait_on_address and FreeBSD has _umtx_op).
>>
>> Before this change, the decision of whether to do a proxy wait or to
>> wait on the atomic variable itself was made in the header at
>> compile-time, which makes it an ABI property that would not have been
>> possible to change later.  That would have meant that
>> std::atomic<uint64_t> would always have to do a proxy wait even if Linux
>> gains support for 64-bit futex2(2) calls at some point in the future.
>> The disadvantage of proxy waits is that several distinct atomic objects
>> can share the same proxy state, leading to contention between threads
>> even when they are not waiting on the same atomic object, similar to
>> false sharing. It also result in spurious wake-ups because doing a
>> notify on an atomic object that uses a proxy wait will wake all waiters
>> sharing the proxy.
>>
>> For types that are known to definitely not need a proxy wait (e.g. int
>> on Linux) the header can still choose a more efficient path at
>> compile-time. But for other types, the decision of whether to do a proxy
>> wait is deferred to runtime, inside the library internals. This will
>> make it possible for future versions of libstdc++.so to extend the set
>> of types which don't need to use proxy waits, without ABI changes.
>>
>> The way the change works is to stop using the __proxy_wait flag that was
>> set by the inline code in the headers. Instead the __wait_args struct
>> has an extra pointer member which the library internals populate with
>> either the address of the atomic object or the _M_ver counter in the
>> proxy state. There is also a new _M_obj_size member which stores the
>> size of the atomic object, so that the library can decide whether a
>> proxy is needed. So for example if linux gains 64-bit futex support then
>> the library can decide not to use a proxy when _M_obj_size == 8.
>> Finally, the _M_old member of the __wait_args struct is changed to
>> uint64_t so that it has room to store 64-bit values, not just whatever
>> size the __platform_wait_t type is (which is a 32-bit int on Linux).
>> Similarly, the _M_val member of __wait_result_type changes to uint64_t
>> too.
>>
>> libstdc++-v3/ChangeLog:
>>
>>         * config/abi/pre/gnu.ver: Adjust exports.
>>         * include/bits/atomic_timed_wait.h
>> (_GLIBCXX_HAVE_PLATFORM_TIMED_WAIT):
>>         Do not define this macro.
>>         (__atomic_wait_address_until_v, __atomic_wait_address_for_v):
>>         Guard assertions with #ifdef _GLIBCXX_UNKNOWN_PLATFORM_WAIT.
>>         * include/bits/atomic_wait.h (__platform_wait_uses_type):
>>         Different separately for platforms with and without platform
>>         wait.
>>         (_GLIBCXX_HAVE_PLATFORM_WAIT): Do not define this macro.
>>         (_GLIBCXX_UNKNOWN_PLATFORM_WAIT): Define new macro.
>>         (__wait_value_type): New typedef.
>>         (__wait_result_type): Change _M_val to __wait_value_type.
>>         (__wait_args_base::_M_old): Change to __wait_args_base.
>>         (__wait_args_base::_M_obg, __wait_args_base::_M_obj_size): New
>>         data members.
>>         (__wait_args::__wait_args): Set _M_obj and _M_obj_size on
>>         construction.
>>         (__wait_args::_M_setup_wait): Change void* parameter to deduced
>>         type. Use _S_bit_cast instead of __builtin_bit_cast.
>>         (__wait_args::_M_load_proxy_wait_val): Remove function, replace
>>         with ...
>>         (__wait_args::_M_setup_wait_impl): New function.
>>         (__wait_args::_S_bit_cast): Wrapper for __builtin_bit_cast which
>>         also supports conversion from 32-bit values.
>>         (__wait_args::_S_flags_for): Do not set __proxy_wait flag.
>>         (__atomic_wait_address_v): Guard assertions with #ifdef
>>         _GLIBCXX_UNKNOWN_PLATFORM_WAIT.
>>         * src/c++20/atomic.cc (_GLIBCXX_HAVE_PLATFORM_WAIT): Define here
>>         instead of in header. Check _GLIBCXX_HAVE_PLATFORM_WAIT instead
>>         of _GLIBCXX_HAVE_PLATFORM_TIMED_WAIT.
>>         (__spin_impl): Adjust for 64-bit __wait_args_base::_M_old.
>>         (use_proxy_wait): New function.
>>         (__wait_args::_M_load_proxy_wait_val): Replace with ...
>>         (__wait_args::_M_setup_wait_impl): New function. Call
>>         use_proxy_wait to decide at runtime whether to wait on the
>>         pointer directly instead of using a proxy. If a proxy is needed,
>>         set _M_obj to point to its _M_ver member. Adjust for change to
>>         type of _M_old.
>>         (__wait_impl): Wait on _M_obj unconditionally.
>>         (__notify_impl): Call use_proxy_wait to decide whether to notify
>>         on the address parameter or a proxy
>>         (__spin_until_impl): Adjust for change to type of _M_val.
>>         (__wait_until_impl): Wait on _M_obj unconditionally.
>> ---
>>
>> Tested x86_64-linux, powerpc64le-linux, sparc-solaris.
>>
> A lot of comments below.
>
>>
>> I think this is an imporant change which I unfortunately didn't think of
>> until recently.
>>
>> This changes the exports from the shared library, but we're still in
>> stage 1 so I think that should be allowed (albeit unfortunate). Nobody
>> should be expecting GCC 16 to be stable yet.
>>
>> The __proxy_wait enumerator is now unused and could be removed. The
>> __abi_version enumerator could also be bumped to indicate the
>> incompatibility with earlier snapshots of GCC 16, but I don't think that
>> is needed. We could in theory keep the old symbol export
>> (__wait_args::_M_load_proxy_wait) and make it trap/abort if called, but
>> I'd prefer to just remove it and cause dynamic linker errors instead.
>>
>> There's a TODO in the header about which types should be allowed to take
>> the optimized paths (see the __waitable concept). For types where that's
>> true, if the size matches a futex then we'll use a futex, even if it's
>> actually an enum or floating-point type (or pointer on 32-bit targets).
>> I'm not sure if that's safe.
>>
>>
>>  libstdc++-v3/config/abi/pre/gnu.ver           |   3 +-
>>  libstdc++-v3/include/bits/atomic_timed_wait.h |  12 +-
>>  libstdc++-v3/include/bits/atomic_wait.h       | 109 +++++++++-----
>>  libstdc++-v3/src/c++20/atomic.cc              | 140 +++++++++++-------
>>  4 files changed, 166 insertions(+), 98 deletions(-)
>>
>> diff --git a/libstdc++-v3/config/abi/pre/gnu.ver
>> b/libstdc++-v3/config/abi/pre/gnu.ver
>> index 2e48241d51f9..3c2bd4921730 100644
>> --- a/libstdc++-v3/config/abi/pre/gnu.ver
>> +++ b/libstdc++-v3/config/abi/pre/gnu.ver
>> @@ -2553,7 +2553,8 @@ GLIBCXX_3.4.35 {
>>      _ZNSt8__detail11__wait_implEPKvRNS_16__wait_args_baseE;
>>      _ZNSt8__detail13__notify_implEPKvbRKNS_16__wait_args_baseE;
>>
>>  
>> _ZNSt8__detail17__wait_until_implEPKvRNS_16__wait_args_baseERKNSt6chrono8durationI[lx]St5ratioIL[lx]1EL[lx]1000000000EEEE;
>> -    _ZNSt8__detail11__wait_args22_M_load_proxy_wait_valEPKv;
>> +    _ZNSt8__detail11__wait_args18_M_setup_wait_implEPKv;
>> +    _ZNSt8__detail11__wait_args20_M_setup_notify_implEPKv;
>>
>>      # std::chrono::gps_clock::now, tai_clock::now
>>      _ZNSt6chrono9gps_clock3nowEv;
>> diff --git a/libstdc++-v3/include/bits/atomic_timed_wait.h
>> b/libstdc++-v3/include/bits/atomic_timed_wait.h
>> index 30f7ff616840..918a267d10eb 100644
>> --- a/libstdc++-v3/include/bits/atomic_timed_wait.h
>> +++ b/libstdc++-v3/include/bits/atomic_timed_wait.h
>> @@ -75,14 +75,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>>           return chrono::ceil<__w_dur>(__atime);
>>        }
>>
>> -#ifdef _GLIBCXX_HAVE_LINUX_FUTEX
>> -#define _GLIBCXX_HAVE_PLATFORM_TIMED_WAIT
>> -#else
>> -// define _GLIBCXX_HAVE_PLATFORM_TIMED_WAIT and implement
>> __platform_wait_until
>> -// if there is a more efficient primitive supported by the platform
>> -// (e.g. __ulock_wait) which is better than pthread_cond_clockwait.
>> -#endif // ! HAVE_LINUX_FUTEX
>> -
>>      __wait_result_type
>>      __wait_until_impl(const void* __addr, __wait_args_base& __args,
>>                       const __wait_clock_t::duration& __atime);
>> @@ -156,7 +148,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>>                                   const chrono::time_point<_Clock, _Dur>&
>> __atime,
>>                                   bool __bare_wait = false) noexcept
>>      {
>> -#ifndef _GLIBCXX_HAVE_PLATFORM_TIMED_WAIT
>> +#ifdef _GLIBCXX_UNKNOWN_PLATFORM_WAIT
>>        __glibcxx_assert(false); // This function can't be used for proxy
>> wait.
>>  #endif
>>        __detail::__wait_args __args{ __addr, __old, __order, __bare_wait
>> };
>> @@ -208,7 +200,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>>                                 const chrono::duration<_Rep, _Period>&
>> __rtime,
>>                                 bool __bare_wait = false) noexcept
>>      {
>> -#ifndef _GLIBCXX_HAVE_PLATFORM_TIMED_WAIT
>> +#ifdef _GLIBCXX_UNKNOWN_PLATFORM_WAIT
>>
> This name really reads strange, and sounds like something with "TODO".
> I think  _GLIBCXX_HAVE_PLATFORM_TIMED_WAIT was just OK name, even
> if it was not used directly.
>
>>        __glibcxx_assert(false); // This function can't be used for proxy
>> wait.
>>  #endif
>>        __detail::__wait_args __args{ __addr, __old, __order, __bare_wait
>> };
>> diff --git a/libstdc++-v3/include/bits/atomic_wait.h
>> b/libstdc++-v3/include/bits/atomic_wait.h
>> index 95151479c120..49369419d6a6 100644
>> --- a/libstdc++-v3/include/bits/atomic_wait.h
>> +++ b/libstdc++-v3/include/bits/atomic_wait.h
>> @@ -45,35 +45,34 @@
>>  namespace std _GLIBCXX_VISIBILITY(default)
>>  {
>>  _GLIBCXX_BEGIN_NAMESPACE_VERSION
>> +#if defined _GLIBCXX_HAVE_LINUX_FUTEX
>>    namespace __detail
>>    {
>> -#ifdef _GLIBCXX_HAVE_LINUX_FUTEX
>> -#define _GLIBCXX_HAVE_PLATFORM_WAIT 1
>>      using __platform_wait_t = int;
>>      inline constexpr size_t __platform_wait_alignment = 4;
>> +  }
>> +  template<typename _Tp>
>> +    inline constexpr bool __platform_wait_uses_type
>> +      = is_scalar_v<_Tp> && sizeof(_Tp) == sizeof(int) && alignof(_Tp)
>> >= 4;
>>  #else
>> +# define _GLIBCXX_UNKNOWN_PLATFORM_WAIT 1
>>  // define _GLIBCX_HAVE_PLATFORM_WAIT and implement __platform_wait()
>>  // and __platform_notify() if there is a more efficient primitive
>> supported
>>  // by the platform (e.g. __ulock_wait()/__ulock_wake()) which is better
>> than
>>  // a mutex/condvar based wait.
>> +  namespace __detail
>> +  {
>>  # if ATOMIC_LONG_LOCK_FREE == 2
>>      using __platform_wait_t = unsigned long;
>>  # else
>>      using __platform_wait_t = unsigned int;
>>  # endif
>>      inline constexpr size_t __platform_wait_alignment
>> -      = __alignof__(__platform_wait_t);
>> -#endif
>> +      = sizeof(__platform_wait_t) < __alignof__(__platform_wait_t)
>> +         ? __alignof__(__platform_wait_t) : sizeof(__platform_wait_t);
>>    } // namespace __detail
>> -
>> -  template<typename _Tp>
>> -    inline constexpr bool __platform_wait_uses_type
>> -#ifdef _GLIBCXX_HAVE_PLATFORM_WAIT
>> -      = is_scalar_v<_Tp>
>> -       && ((sizeof(_Tp) == sizeof(__detail::__platform_wait_t))
>> -       && (alignof(_Tp) >= __detail::__platform_wait_alignment));
>> -#else
>> -      = false;
>> +  template<typename>
>> +    inline constexpr bool __platform_wait_uses_type = false;
>>  #endif
>>
>>    namespace __detail
>> @@ -105,10 +104,19 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>>         return __builtin_memcmp(&__a, &__b, sizeof(_Tp)) == 0;
>>        }
>>
>> -    // lightweight std::optional<__platform_wait_t>
>> +    // TODO: this needs to be false for types with padding, e.g. __int20.
>>
> I do not understand why this needs to be required. This funciton is used
> only via atomic
> or atomic_ref. For atomic, we can guarantee that the type has padding
> bytes cleared..
>
>> +    // TODO: should this be true only for integral, enum, and pointer
>> types?
>>
> What I think is missing here is alignment. I assume that any platform wait
> may use
> bits that are clear due the alignment of platform wait type for some
> internal state.
> Or we are going to check is_sufficiently_aligment in cc file, and use
> different kind of
> wait depending on the object?
>
> But I think, we can later safely extend or change what is waitable (except
> extending it past 8 bytes),
> as if we start putting _M_obj_size to non zero, impl may use platform wait.
>
After pounding a bit on this, I realized that this requirement will be set
in stone after shipping,
because the old TU calling wait needs to agree with the new TU calling
notify, or vice versa.
I.e. set of types that are waitable need to be the same on both sides.

>
> So, I will go with safe option is integral, pointer or enum, This would
> also give
> us no padding guarantee, I assume?
>
>> +    template<typename _Tp>
>> +      concept __waitable
>> +       = is_scalar_v<_Tp> && (sizeof(_Tp) <= sizeof(__UINT64_TYPE__));
>> +
>> +    // Storage for up to 64 bits of value, should be considered opaque
>> bits.
>> +    using __wait_value_type = __UINT64_TYPE__;
>> +
>> +    // lightweight std::optional<__wait_value_type>
>>      struct __wait_result_type
>>      {
>> -      __platform_wait_t _M_val;
>> +      __wait_value_type _M_val;
>>        unsigned char _M_has_val : 1; // _M_val value was loaded before
>> return.
>>        unsigned char _M_timeout : 1; // Waiting function ended with
>> timeout.
>>        unsigned char _M_unused : 6;  // padding
>> @@ -143,8 +151,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>>      {
>>        __wait_flags _M_flags;
>>        int _M_order = __ATOMIC_ACQUIRE;
>> -      __platform_wait_t _M_old = 0;
>> +      __wait_value_type _M_old{};
>>        void* _M_wait_state = nullptr;
>> +      const void* _M_obj = nullptr;  // The address of the object to
>> wait on.
>> +      unsigned char _M_obj_size = 0; // The size of that object.
>>
>>        // Test whether _M_flags & __flags is non-zero.
>>        bool
>> @@ -162,36 +172,48 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>>         explicit
>>         __wait_args(const _Tp* __addr, bool __bare_wait = false) noexcept
>>         : __wait_args_base{ _S_flags_for(__addr, __bare_wait) }
>> -       { }
>> +       {
>> +         _M_obj = __addr; // Might be replaced by _M_setup_wait
>> +         if constexpr (__waitable<_Tp>)
>> +           // __wait_impl might be able to wait directly on __addr
>> +           // instead of using a proxy, depending on its size.
>> +           _M_obj_size = sizeof(_Tp);
>> +       }
>>
>>        __wait_args(const __platform_wait_t* __addr, __platform_wait_t
>> __old,
>>                   int __order, bool __bare_wait = false) noexcept
>> -      : __wait_args_base{ _S_flags_for(__addr, __bare_wait), __order,
>> __old }
>> -      { }
>> +      : __wait_args(__addr, __bare_wait)
>> +      {
>> +       _M_order = __order;
>> +       _M_old = __old;
>> +      }
>>
>>        __wait_args(const __wait_args&) noexcept = default;
>>        __wait_args& operator=(const __wait_args&) noexcept = default;
>>
>> -      template<typename _ValFn,
>> -              typename _Tp =
>> decay_t<decltype(std::declval<_ValFn&>()())>>
>> +      template<typename _Tp, typename _ValFn>
>>         _Tp
>> -       _M_setup_wait(const void* __addr, _ValFn __vfn,
>> +       _M_setup_wait(const _Tp* __addr, _ValFn __vfn,
>>                       __wait_result_type __res = {})
>>         {
>> +         static_assert(is_same_v<_Tp, decay_t<decltype(__vfn())>>);
>> +
>>           if constexpr (__platform_wait_uses_type<_Tp>)
>>             {
>> -             // If the wait is not proxied, the value we check when
>> waiting
>> -             // is the value of the atomic variable itself.
>> +             // If we know for certain that this type can be waited on
>> +             // efficiently using something like a futex syscall,
>> +             // then we can avoid the overhead of _M_setup_wait_impl
>> +             // and just load the value and store it into _M_old.
>>
>> -             if (__res._M_has_val) // The previous wait loaded a recent
>> value.
>> +             if (__res._M_has_val) // A previous wait loaded a recent
>> value.
>>                 {
>>                   _M_old = __res._M_val;
>> -                 return __builtin_bit_cast(_Tp, __res._M_val);
>> +                 return _S_bit_cast<_Tp>(_M_old);
>>
> I am not sure if I understand which branch of _S_bit_cast this would use,
> we have neither (sizeof(To) == sizeof(From) i.e. 4 vs 8) and
> neither sizeof(_From) == sizeof(__UINT32_TYPE__).
> I would much more prefer if this would be dome as:
> return __builtin_bit_cast(_Tp, static_cast<__UINT32_TYPE__>(_M_old));
>
>>                 }
>>               else // Load the value from __vfn
>>                 {
>> -                 _Tp __val = __vfn();
>> -                 _M_old = __builtin_bit_cast(__platform_wait_t, __val);
>> +                 auto __val = __vfn();
>> +                 _M_old = _S_bit_cast<__wait_value_type>(__val);
>>
> And here:
>       _M_ old = __builtin_bit_cast(__UINT32_TYPE__, __val);
> However, instead of __UINT32_TYPE__, we should use
> make_unsinged<__platform_wait_t>.
>
>>                   return __val;
>>                 }
>>             }
>> @@ -199,16 +221,33 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>>             {
>>               if (__res._M_has_val) // The previous wait loaded a recent
>> value.
>>                 _M_old = __res._M_val;
>> -             else // Load _M_ver from the proxy (must happen before
>> __vfn()).
>> -               _M_load_proxy_wait_val(__addr);
>> +             else // Let the library decide how to setup the wait.
>> +               {
>> +                 // Set _M_obj to the address to be waited on (either
>> __addr
>> +                 // or a proxy) and load its current value into _M_old.
>> +                 _M_setup_wait_impl(__addr);
>> +               }
>>               return __vfn();
>>             }
>>         }
>>
>>      private:
>> -      // Populates _M_wait_state and _M_old from the proxy for __addr.
>> +      template<typename _To, typename _From>
>> +       [[__gnu__::__always_inline__]]
>> +       static _To
>> +       _S_bit_cast(_From __from)
>> +       {
>> +         if constexpr (sizeof(_To) == sizeof(_From))
>> +           return __builtin_bit_cast(_To, __from);
>> +         else if constexpr (sizeof(_From) == sizeof(__UINT32_TYPE__))
>> +           return __builtin_bit_cast(__UINT32_TYPE__, __from);
>> +         else
>> +           __builtin_unreachable();
>> +       }
>> +
>> +      // Populates _M_wait_state and _M_old appropriately for __addr.
>>        void
>> -      _M_load_proxy_wait_val(const void* __addr);
>> +      _M_setup_wait_impl(const void* __addr);
>>
>>        template<typename _Tp>
>>         static constexpr __wait_flags
>> @@ -218,8 +257,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>>           __wait_flags __res = __abi_version | __do_spin;
>>           if (!__bare_wait)
>>             __res |= __track_contention;
>> -         if constexpr (!__platform_wait_uses_type<_Tp>)
>> -           __res |= __proxy_wait;
>> +         // if constexpr (!__platform_wait_uses_type<_Tp>)
>> +         // __res |= __proxy_wait;
>>
> Remove this simply.
>
>>           return __res;
>>         }
>>      };
>> @@ -255,7 +294,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>>                           __detail::__platform_wait_t __old,
>>                           int __order, bool __bare_wait = false)
>>    {
>> -#ifndef _GLIBCXX_HAVE_PLATFORM_WAIT
>> +#ifdef _GLIBCXX_UNKNOWN_PLATFORM_WAIT
>>      __glibcxx_assert(false); // This function can't be used for proxy
>> wait.
>>  #endif
>>      __detail::__wait_args __args{ __addr, __old, __order, __bare_wait };
>> diff --git a/libstdc++-v3/src/c++20/atomic.cc
>> b/libstdc++-v3/src/c++20/atomic.cc
>> index e280045b619d..d5fbd10ed11f 100644
>> --- a/libstdc++-v3/src/c++20/atomic.cc
>> +++ b/libstdc++-v3/src/c++20/atomic.cc
>> @@ -38,14 +38,7 @@
>>  # include <syscall.h>
>>  # include <bits/functexcept.h>
>>  # include <sys/time.h>
>> -#endif
>> -
>> -#ifdef _GLIBCXX_HAVE_PLATFORM_WAIT
>> -# ifndef _GLIBCXX_HAVE_PLATFORM_TIMED_WAIT
>> -// __waitable_state assumes that we consistently use the same
>> implementation
>> -// (i.e. futex vs mutex+condvar) for timed and untimed waiting.
>> -#  error "This configuration is not currently supported"
>> -# endif
>> +# define _GLIBCXX_HAVE_PLATFORM_WAIT 1
>>  #endif
>>
>>  #pragma GCC diagnostic ignored "-Wmissing-field-initializers"
>> @@ -107,7 +100,7 @@ namespace
>>      // Count of threads blocked waiting on this state.
>>      alignas(_S_align) __platform_wait_t _M_waiters = 0;
>>
>> -#ifndef _GLIBCXX_HAVE_PLATFORM_TIMED_WAIT
>> +#ifndef _GLIBCXX_HAVE_PLATFORM_WAIT
>>      mutex _M_mtx;
>>
>>      // This type meets the Cpp17BasicLockable requirements.
>> @@ -123,7 +116,7 @@ namespace
>>      // use this for waiting and notifying functions instead.
>>      alignas(_S_align) __platform_wait_t _M_ver = 0;
>>
>> -#ifndef _GLIBCXX_HAVE_PLATFORM_TIMED_WAIT
>> +#ifndef _GLIBCXX_HAVE_PLATFORM_WAIT
>>      __condvar _M_cv;
>>  #endif
>>
>> @@ -216,17 +209,19 @@ namespace
>>    __spin_impl(const __platform_wait_t* __addr, const __wait_args_base&
>> __args)
>>    {
>>      __platform_wait_t __val{};
>> +    __wait_value_type wval;
>>      for (auto __i = 0; __i < __atomic_spin_count; ++__i)
>>        {
>>         __atomic_load(__addr, &__val, __args._M_order);
>> -       if (__val != __args._M_old)
>> -         return { ._M_val = __val, ._M_has_val = true, ._M_timeout =
>> false };
>> +       wval = static_cast<__wait_value_type>(__val);
>> +       if (wval != __args._M_old)
>> +         return { ._M_val = wval, ._M_has_val = true, ._M_timeout =
>> false };
>>         if (__i < __atomic_spin_count_relax)
>>           __thread_relax();
>>         else
>>           __thread_yield();
>>        }
>> -    return { ._M_val = __val, ._M_has_val = true, ._M_timeout = true };
>> +    return { ._M_val = wval, ._M_has_val = true, ._M_timeout = true };
>>    }
>>
>>    inline __waitable_state*
>> @@ -237,32 +232,58 @@ namespace
>>      return static_cast<__waitable_state*>(args._M_wait_state);
>>    }
>>
>> +  [[gnu::always_inline]]
>> +  inline bool
>> +  use_proxy_wait(const __wait_args_base& args)
>>
> This needs to take the address that we are waiting on and check if it's
> sufficiently aligned
> to __platform_wait_aligment.
> However, I do not think we should check unit64_t at all, as we do not have
> such platform
> wait yet, and we will not change __platform_wait_uses_type in that case.
>>
>> +  {
>> +    if constexpr (__platform_wait_uses_type<uint32_t>)
>> +      if (args._M_obj_size == sizeof(uint32_t))
>> +       return false;
>> +
>> +    if constexpr (__platform_wait_uses_type<uint64_t>)
>> +      if (args._M_obj_size == sizeof(uint64_t))
>> +       return false;
>> +
>> +    // Use proxy wait for everything else:
>> +    return true;
>> +  }
>> +
>>  } // namespace
>>
>> -// Called for a proxy wait
>> +// Set _M_wait_state if using proxy wait, or caller wants contention
>> tracking.
>> +// Set _M_obj to &_M_wait_state->_M_ver if using proxy wait.
>> +// Load the current value from _M_obj and store in _M_val.
>>  void
>> -__wait_args::_M_load_proxy_wait_val(const void* addr)
>> +__wait_args::_M_setup_wait_impl(const void* addr)
>>
> I would use _M_obj_size to indicate if proxy wait is used for the actual
> await.
>
>>  {
>> -  // __glibcxx_assert( *this & __wait_flags::__proxy_wait );
>> +  if (!use_proxy_wait(*this))
>
> Do; _M_obj_size > 0 &&  !use_proxy_wait(*this)
>
>> +    {
>> +      // We can wait on this address directly.
>> +      __glibcxx_assert(_M_obj == addr);
>>
> This funciton is one that is expected to set _M_obj, so it seems incorrect
> to assert it here.
>
>>
>> -  // We always need a waitable state for proxy waits.
>> +      int val;
>> +      __atomic_load(static_cast<const int*>(addr), &val,
>> __ATOMIC_ACQUIRE);
>> +      _M_old = val;
>> +
>> +      return;
>> +    }
>> +
>> +  // This will be a proxy wait, so get a waitable state.
>>    auto state = set_wait_state(addr, *this);
>>
>> +  // The address we will wait on is the version count of the waitable
>> state:
>> +  _M_obj = &state->_M_ver;
>>
> +
>>    // Read the value of the _M_ver counter.
>> -  __atomic_load(&state->_M_ver, &_M_old, __ATOMIC_ACQUIRE);
>> +  __platform_wait_t __old;
>> +  __atomic_load(&state->_M_ver, &__old, __ATOMIC_ACQUIRE);
>>
> Nit pick, but I think this could be just:
>     __platform_wait_t __old = __atomic_load_n(&state->_M_ver,
> __ATOMIC_ACQUIRE);
>
>> +  _M_old = __old;
>>
> Or just:
>   _M_old = __atomic_load_n(&state->_M_ver, __ATOMIC_ACQUIRE);
>
>>  }
>>
>>  __wait_result_type
>> -__wait_impl(const void* __addr, __wait_args_base& __args)
>> +__wait_impl([[maybe_unused]] const void* __addr, __wait_args_base&
>> __args)
>>  {
>> -  auto __state = static_cast<__waitable_state*>(__args._M_wait_state);
>> -
>> -  const __platform_wait_t* __wait_addr;
>> -
>> -  if (__args & __wait_flags::__proxy_wait)
>> -    __wait_addr = &__state->_M_ver;
>> -  else
>> -    __wait_addr = static_cast<const __platform_wait_t*>(__addr);
>> +  auto* __wait_addr = static_cast<const
>> __platform_wait_t*>(__args._M_obj);
>>
>>    if (__args & __wait_flags::__do_spin)
>>      {
>> @@ -286,6 +307,7 @@ __wait_impl(const void* __addr, __wait_args_base&
>> __args)
>>    __atomic_load(__wait_addr, &__val, __args._M_order);
>>    if (__val == __args._M_old)
>>      {
>> +      auto __state =
>> static_cast<__waitable_state*>(__args._M_wait_state);
>>        __state->_M_cv.wait(__state->_M_mtx);
>>        return { ._M_val = __val, ._M_has_val = false, ._M_timeout = false
>> };
>>      }
>> @@ -294,24 +316,40 @@ __wait_impl(const void* __addr, __wait_args_base&
>> __args)
>>  }
>>
>>  void
>> -__notify_impl(const void* __addr, [[maybe_unused]] bool __all,
>> +__notify_impl([[maybe_unused]] const void* __addr, [[maybe_unused]] bool
>> __all,
>>               const __wait_args_base& __args)
>>  {
>> -  auto __state = static_cast<__waitable_state*>(__args._M_wait_state);
>> -  if (!__state)
>> -    __state = &__waitable_state::_S_state_for(__addr);
>> +  const bool __track_contention = __args &
>> __wait_flags::__track_contention;
>> +  const bool proxy_wait = use_proxy_wait(__args);
>>
> And here we could just check if _M_obj_size == 0, as this would indication
> of proxy wait.
>
>>
>> -  [[maybe_unused]] const __platform_wait_t* __wait_addr;
>> +  [[maybe_unused]] auto* __wait_addr
>> +    = static_cast<const __platform_wait_t*>(__addr);
>> +
>> +#ifdef _GLIBCXX_HAVE_PLATFORM_WAIT
>> +  // Check whether it would be a non-proxy wait for this object.
>> +  // This condition must match the one in _M_setup_wait_impl to ensure
>> that
>> +  // the address used for the notify matches the one used for the wait.
>> +  if (!proxy_wait)
>> +    {
>> +      if (__track_contention)
>> +       if (!__waitable_state::_S_state_for(__addr)._M_waiting())
>> +         return;
>> +
>> +      __platform_notify(__wait_addr, __all);
>> +      return;
>> +    }
>> +#endif
>> +
>> +  // Either a proxy wait or we don't have platform wait/wake primitives.
>> +
>> +  auto __state = &__waitable_state::_S_state_for(__addr);
>>
>>    // Lock mutex so that proxied waiters cannot race with incrementing
>> _M_ver
>>    // and see the old value, then sleep after the increment and
>> notify_all().
>>    lock_guard __l{ *__state };
>>
>> -  if (__args & __wait_flags::__proxy_wait)
>> +  if (proxy_wait)
>>      {
>> -      // Waiting for *__addr is actually done on the proxy's _M_ver.
>> -      __wait_addr = &__state->_M_ver;
>> -
>>        // Increment _M_ver so that waiting threads see something changed.
>>        // This has to be atomic because the load in _M_load_proxy_wait_val
>>        // is done without the mutex locked.
>> @@ -322,11 +360,11 @@ __notify_impl(const void* __addr, [[maybe_unused]]
>> bool __all,
>>        // they can re-evaluate their conditions to see if they should
>>        // stop waiting or should wait again.
>>        __all = true;
>> -    }
>> -  else // Use the atomic variable's own address.
>> -    __wait_addr = static_cast<const __platform_wait_t*>(__addr);
>>
>> -  if (__args & __wait_flags::__track_contention)
>> +      __wait_addr = &__state->_M_ver;
>> +    }
>> +
>> +  if (__track_contention)
>>      {
>>        if (!__state->_M_waiting())
>>         return;
>> @@ -366,7 +404,7 @@ __platform_wait_until(const __platform_wait_t* __addr,
>>  }
>>  #endif // HAVE_LINUX_FUTEX
>>
>> -#ifndef _GLIBCXX_HAVE_PLATFORM_TIMED_WAIT
>> +#ifndef _GLIBCXX_HAVE_PLATFORM_WAIT
>>  bool
>>  __cond_wait_until(__condvar& __cv, mutex& __mx,
>>                   const __wait_clock_t::time_point& __atime)
>> @@ -381,7 +419,7 @@ __cond_wait_until(__condvar& __cv, mutex& __mx,
>>      __cv.wait_until(__mx, __ts);
>>    return __wait_clock_t::now() < __atime;
>>  }
>> -#endif // ! HAVE_PLATFORM_TIMED_WAIT
>> +#endif // ! HAVE_PLATFORM_WAIT
>>
>>  // Unlike __spin_impl, does not always return _M_has_val == true.
>>  // If the deadline has already passed then no fresh value is loaded.
>> @@ -414,7 +452,9 @@ __spin_until_impl(const __platform_wait_t* __addr,
>>             return __res;
>>         }
>>
>> -      __atomic_load(__addr, &__res._M_val, __args._M_order);
>> +      __platform_wait_t val;
>> +      __atomic_load(__addr, &val, __args._M_order);
>>
> Same comment on __atomic_load_n here.
>
>> +      __res._M_val = val;
>>        __res._M_has_val = true;
>>        if (__res._M_val != __args._M_old)
>>         {
>> @@ -428,16 +468,11 @@ __spin_until_impl(const __platform_wait_t* __addr,
>>  } // namespace
>>
>>  __wait_result_type
>> -__wait_until_impl(const void* __addr, __wait_args_base& __args,
>> +__wait_until_impl([[maybe_unused]] const void* __addr, __wait_args_base&
>> __args,
>>                   const __wait_clock_t::duration& __time)
>>  {
>>    const __wait_clock_t::time_point __atime(__time);
>> -  auto __state = static_cast<__waitable_state*>(__args._M_wait_state);
>> -  const __platform_wait_t* __wait_addr;
>> -  if (__args & __wait_flags::__proxy_wait)
>> -    __wait_addr = &__state->_M_ver;
>> -  else
>> -    __wait_addr = static_cast<const __platform_wait_t*>(__addr);
>> +  auto* __wait_addr = static_cast<const
>> __platform_wait_t*>(__args._M_obj);
>>
>>    if (__args & __wait_flags::__do_spin)
>>      {
>> @@ -448,9 +483,9 @@ __wait_until_impl(const void* __addr,
>> __wait_args_base& __args,
>>         return __res;
>>      }
>>
>> -#ifdef _GLIBCXX_HAVE_PLATFORM_TIMED_WAIT
>> +#ifdef _GLIBCXX_HAVE_PLATFORM_WAIT
>>    if (__args & __wait_flags::__track_contention)
>> -    set_wait_state(__addr, __args);
>> +    set_wait_state(__addr, __args); // scoped_wait needs a
>> __waitable_state
>>    scoped_wait s(__args);
>>    bool timeout = !__platform_wait_until(__wait_addr, __args._M_old,
>> __atime);
>>    return { ._M_val = __args._M_old, ._M_has_val = false, ._M_timeout =
>> timeout };
>> @@ -460,6 +495,7 @@ __wait_until_impl(const void* __addr,
>> __wait_args_base& __args,
>>    __atomic_load(__wait_addr, &__val, __args._M_order);
>>    if (__val == __args._M_old)
>>      {
>> +      auto __state =
>> static_cast<__waitable_state*>(__args._M_wait_state);
>>        bool timeout = !__cond_wait_until(__state->_M_cv, __state->_M_mtx,
>> __atime);
>>        return { ._M_val = __val, ._M_has_val = false, ._M_timeout =
>> timeout };
>>      }
>> --
>> 2.51.1
>>
>>

Re: [PATCH] libstdc++: Future-proof C++20 atomic wait/notify

Reply via email to