On Tue, 25 Nov 2025, 15:10 Jonathan Wakely, <[email protected]> wrote:

> This will allow us to extend atomic waiting functions to support a
> possible future 64-bit version of futex, as well as supporting
> futex-like wait/wake primitives on other targets (e.g. macOS has
> os_sync_wait_on_address and FreeBSD has _umtx_op).
>
> Before this change, the decision of whether to do a proxy wait or to
> wait on the atomic variable itself was made in the header at
> compile-time, which makes it an ABI property that would not have been
> possible to change later.  That would have meant that
> std::atomic<uint64_t> would always have to do a proxy wait even if Linux
> gains support for 64-bit futex2(2) calls at some point in the future.
> The disadvantage of proxy waits is that several distinct atomic objects
> can share the same proxy state, leading to contention between threads
> even when they are not waiting on the same atomic object, similar to
> false sharing. It also result in spurious wake-ups because doing a
> notify on an atomic object that uses a proxy wait will wake all waiters
> sharing the proxy.
>
> For types that are known to definitely not need a proxy wait (e.g. int
> on Linux) the header can still choose a more efficient path at
> compile-time. But for other types, the decision of whether to do a proxy
> wait is deferred to runtime, inside the library internals. This will
> make it possible for future versions of libstdc++.so to extend the set
> of types which don't need to use proxy waits, without ABI changes.
>
> The way the change works is to stop using the __proxy_wait flag that was
> set by the inline code in the headers. Instead the __wait_args struct
> has an extra pointer member which the library internals populate with
> either the address of the atomic object or the _M_ver counter in the
> proxy state. There is also a new _M_obj_size member which stores the
> size of the atomic object, so that the library can decide whether a
> proxy is needed. So for example if linux gains 64-bit futex support then
> the library can decide not to use a proxy when _M_obj_size == 8.
> Finally, the _M_old member of the __wait_args struct is changed to
> uint64_t so that it has room to store 64-bit values, not just whatever
> size the __platform_wait_t type is (which is a 32-bit int on Linux).
> Similarly, the _M_val member of __wait_result_type changes to uint64_t
> too.
>
> libstdc++-v3/ChangeLog:
>
>         * config/abi/pre/gnu.ver: Adjust exports.
>         * include/bits/atomic_timed_wait.h
> (_GLIBCXX_HAVE_PLATFORM_TIMED_WAIT):
>         Do not define this macro.
>         (__atomic_wait_address_until_v, __atomic_wait_address_for_v):
>         Adjust assertions to check that __platform_wait_uses_type is
>         true.
>         * include/bits/atomic_wait.h (__waitable): New concept.
>         (__platform_wait_uses_type): Different separately for platforms
>         with and without platform wait.
>         (_GLIBCXX_HAVE_PLATFORM_WAIT): Do not define this macro.
>         (__wait_value_type): New typedef.
>         (__wait_result_type): Change _M_val to __wait_value_type.
>         (__wait_flags): Remove __proxy_wait enumerator. Reduce range
>         reserved for ABI version by the commented-out value.
>         (__wait_args_base::_M_old): Change type to __wait_args_base.
>         (__wait_args_base::_M_obj, __wait_args_base::_M_obj_size): New
>         data members.
>         (__wait_args::__wait_args): Set _M_obj and _M_obj_size on
>         construction.
>         (__wait_args::_M_setup_wait): Change void* parameter to deduced
>         type. Adjust bit_cast to work for types of different sizes.
>         (__wait_args::_M_load_proxy_wait_val): Remove function, replace
>         with ...
>         (__wait_args::_M_setup_wait_impl): New function.
>         (__wait_args::_S_flags_for): Do not set __proxy_wait flag.
>         (__atomic_wait_address_v): Adjust assertion to check that
>         __platform_wait_uses_type is true.
>         * src/c++20/atomic.cc (_GLIBCXX_HAVE_PLATFORM_WAIT): Define here
>         instead of in header. Check _GLIBCXX_HAVE_PLATFORM_WAIT instead
>         of _GLIBCXX_HAVE_PLATFORM_TIMED_WAIT.
>         (__platform_wait, __platform_wait_until): Add unused parameter
>         for _M_obj_size.
>         (__spin_impl): Adjust for 64-bit __wait_args_base::_M_old.
>         (use_proxy_wait): New function.
>         (__wait_args::_M_load_proxy_wait_val): Replace with ...
>         (__wait_args::_M_setup_wait_impl): New function. Call
>         use_proxy_wait to decide at runtime whether to wait on the
>         pointer directly instead of using a proxy. If a proxy is needed,
>         set _M_obj and _M_obj_size to refer to its _M_ver member. Adjust
>         for change to type of _M_old.
>         (__wait_impl): Wait on _M_obj unconditionally. Pass _M_obj_size
>         to __platform_wait.
>         (__notify_impl): Call use_proxy_wait to decide whether to notify
>         on the address parameter or a proxy
>         (__spin_until_impl): Adjust for change to type of _M_val.
>         (__wait_until_impl): Wait on _M_obj unconditionally. Pass
>         _M_obj_size to __platform_wait_until.
> ---
>
> v2:
>
> - Removed confusing UNKNOWN_PLATFORM_WAIT macro.
> - Used __waitable concept in __platform_wait_uses_type variable.
> - Removed __wait_args::__proxy_wait enumerator.
> - Removed _S_bit_cast and inlined the bit cast operations.
> - Removed some unnecessary bit casts.
> - Simplified atomic loads by using __atomic_load_n.
> - Added (unused) addr parameter to use_proxy_wait function, to be used
>   for checking alignment in future.
> - Added (unused) obj_sz parameter to platform wait functions, to be used
>   for supporting 32-bit and 64-bit platform waits in future.
>
> Tested x86_64-linux and x86_64-freebsd14.0
>
>  libstdc++-v3/config/abi/pre/gnu.ver           |   3 +-
>  libstdc++-v3/include/bits/atomic_timed_wait.h |  20 +--
>  libstdc++-v3/include/bits/atomic_wait.h       | 119 ++++++++-----
>  libstdc++-v3/src/c++20/atomic.cc              | 162 +++++++++++-------
>  4 files changed, 185 insertions(+), 119 deletions(-)
>
> diff --git a/libstdc++-v3/config/abi/pre/gnu.ver
> b/libstdc++-v3/config/abi/pre/gnu.ver
> index 2e48241d51f9..3c2bd4921730 100644
> --- a/libstdc++-v3/config/abi/pre/gnu.ver
> +++ b/libstdc++-v3/config/abi/pre/gnu.ver
> @@ -2553,7 +2553,8 @@ GLIBCXX_3.4.35 {
>      _ZNSt8__detail11__wait_implEPKvRNS_16__wait_args_baseE;
>      _ZNSt8__detail13__notify_implEPKvbRKNS_16__wait_args_baseE;
>
>  
> _ZNSt8__detail17__wait_until_implEPKvRNS_16__wait_args_baseERKNSt6chrono8durationI[lx]St5ratioIL[lx]1EL[lx]1000000000EEEE;
> -    _ZNSt8__detail11__wait_args22_M_load_proxy_wait_valEPKv;
> +    _ZNSt8__detail11__wait_args18_M_setup_wait_implEPKv;
> +    _ZNSt8__detail11__wait_args20_M_setup_notify_implEPKv;
>
>      # std::chrono::gps_clock::now, tai_clock::now
>      _ZNSt6chrono9gps_clock3nowEv;
> diff --git a/libstdc++-v3/include/bits/atomic_timed_wait.h
> b/libstdc++-v3/include/bits/atomic_timed_wait.h
> index 30f7ff616840..5b3158050668 100644
> --- a/libstdc++-v3/include/bits/atomic_timed_wait.h
> +++ b/libstdc++-v3/include/bits/atomic_timed_wait.h
> @@ -75,14 +75,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>           return chrono::ceil<__w_dur>(__atime);
>        }
>
> -#ifdef _GLIBCXX_HAVE_LINUX_FUTEX
> -#define _GLIBCXX_HAVE_PLATFORM_TIMED_WAIT
> -#else
> -// define _GLIBCXX_HAVE_PLATFORM_TIMED_WAIT and implement
> __platform_wait_until
> -// if there is a more efficient primitive supported by the platform
> -// (e.g. __ulock_wait) which is better than pthread_cond_clockwait.
> -#endif // ! HAVE_LINUX_FUTEX
> -
>      __wait_result_type
>      __wait_until_impl(const void* __addr, __wait_args_base& __args,
>                       const __wait_clock_t::duration& __atime);
> @@ -156,9 +148,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>                                   const chrono::time_point<_Clock, _Dur>&
> __atime,
>                                   bool __bare_wait = false) noexcept
>      {
> -#ifndef _GLIBCXX_HAVE_PLATFORM_TIMED_WAIT
> -      __glibcxx_assert(false); // This function can't be used for proxy
> wait.
> -#endif
> +      // This function must not be used if __wait_impl might use a proxy
> wait:
> +
> __glibcxx_assert(__platform_wait_uses_type<__detail::__platform_wait_t>);
> +
>        __detail::__wait_args __args{ __addr, __old, __order, __bare_wait };
>        auto __res = __detail::__wait_until(__addr, __args, __atime);
>        return !__res._M_timeout; // C++26 will also return last observed
> __val
> @@ -208,9 +200,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>                                 const chrono::duration<_Rep, _Period>&
> __rtime,
>                                 bool __bare_wait = false) noexcept
>      {
> -#ifndef _GLIBCXX_HAVE_PLATFORM_TIMED_WAIT
> -      __glibcxx_assert(false); // This function can't be used for proxy
> wait.
> -#endif
> +      // This function must not be used if __wait_impl might use a proxy
> wait:
> +
> __glibcxx_assert(__platform_wait_uses_type<__detail::__platform_wait_t>);
> +
>        __detail::__wait_args __args{ __addr, __old, __order, __bare_wait };
>        auto __res = __detail::__wait_for(__addr, __args, __rtime);
>        return !__res._M_timeout; // C++26 will also return last observed
> __val
> diff --git a/libstdc++-v3/include/bits/atomic_wait.h
> b/libstdc++-v3/include/bits/atomic_wait.h
> index 95151479c120..a280d3534f46 100644
> --- a/libstdc++-v3/include/bits/atomic_wait.h
> +++ b/libstdc++-v3/include/bits/atomic_wait.h
> @@ -45,35 +45,50 @@
>  namespace std _GLIBCXX_VISIBILITY(default)
>  {
>  _GLIBCXX_BEGIN_NAMESPACE_VERSION
> +
>    namespace __detail
>    {
> -#ifdef _GLIBCXX_HAVE_LINUX_FUTEX
> -#define _GLIBCXX_HAVE_PLATFORM_WAIT 1
> +    // TODO: this needs to be false for types with padding, e.g. __int20.
> +    // TODO: should this be true only for integral, enum, and pointer
> types?
> +    template<typename _Tp>
> +      concept __waitable
> +       = is_scalar_v<_Tp> && (sizeof(_Tp) <= sizeof(__UINT64_TYPE__));
> +  }
> +
> +#if defined _GLIBCXX_HAVE_LINUX_FUTEX
> +  namespace __detail
> +  {
> +    // Use futex syscall on int objects.
>      using __platform_wait_t = int;
>      inline constexpr size_t __platform_wait_alignment = 4;
> +  }
> +  // Defined to true for a subset of __waitable types which are statically
> +  // known to definitely be able to use futex, not a proxy wait.
> +  template<typename _Tp>
> +    inline constexpr bool __platform_wait_uses_type
> +      = __detail::__waitable<_Tp>
> +         && sizeof(_Tp) == sizeof(int) && alignof(_Tp) >= 4;
>  #else
>  // define _GLIBCX_HAVE_PLATFORM_WAIT and implement __platform_wait()
>  // and __platform_notify() if there is a more efficient primitive
> supported
>  // by the platform (e.g. __ulock_wait()/__ulock_wake()) which is better
> than
>  // a mutex/condvar based wait.
> +  namespace __detail
> +  {
>  # if ATOMIC_LONG_LOCK_FREE == 2
>      using __platform_wait_t = unsigned long;
>  # else
>      using __platform_wait_t = unsigned int;
>  # endif
>      inline constexpr size_t __platform_wait_alignment
> -      = __alignof__(__platform_wait_t);
> -#endif
> +      = sizeof(__platform_wait_t) < __alignof__(__platform_wait_t)
> +         ? __alignof__(__platform_wait_t) : sizeof(__platform_wait_t);
>    } // namespace __detail
>
> -  template<typename _Tp>
> -    inline constexpr bool __platform_wait_uses_type
> -#ifdef _GLIBCXX_HAVE_PLATFORM_WAIT
> -      = is_scalar_v<_Tp>
> -       && ((sizeof(_Tp) == sizeof(__detail::__platform_wait_t))
> -       && (alignof(_Tp) >= __detail::__platform_wait_alignment));
> -#else
> -      = false;
> +  // This must be false for the general case where we don't know of any
> +  // futex-like syscall.
> +  template<typename>
> +    inline constexpr bool __platform_wait_uses_type = false;
>  #endif
>
>    namespace __detail
> @@ -105,10 +120,13 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>         return __builtin_memcmp(&__a, &__b, sizeof(_Tp)) == 0;
>        }
>
> -    // lightweight std::optional<__platform_wait_t>
> +    // Storage for up to 64 bits of value, should be considered opaque
> bits.
> +    using __wait_value_type = __UINT64_TYPE__;
> +
> +    // lightweight std::optional<__wait_value_type>
>      struct __wait_result_type
>      {
> -      __platform_wait_t _M_val;
> +      __wait_value_type _M_val;
>        unsigned char _M_has_val : 1; // _M_val value was loaded before
> return.
>        unsigned char _M_timeout : 1; // Waiting function ended with
> timeout.
>        unsigned char _M_unused : 6;  // padding
> @@ -116,12 +134,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>
>      enum class __wait_flags : __UINT_LEAST32_TYPE__
>      {
> -       __abi_version = 0,
> -       __proxy_wait = 1,
> +       __abi_version = 0x00000000,
> +       // currently unused = 1,
>         __track_contention = 2,
>         __do_spin = 4,
>         __spin_only = 8, // Ignored unless __do_spin is also set.
> -       // __abi_version_mask = 0xffff0000,
> +       // __abi_version_mask = 0xff000000,
>      };
>
>      [[__gnu__::__always_inline__]]
> @@ -143,8 +161,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>      {
>        __wait_flags _M_flags;
>        int _M_order = __ATOMIC_ACQUIRE;
> -      __platform_wait_t _M_old = 0;
> +      __wait_value_type _M_old = 0;
>        void* _M_wait_state = nullptr;
> +      const void* _M_obj = nullptr;  // The address of the object to wait
> on.
> +      unsigned char _M_obj_size = 0; // The size of that object.
>
>        // Test whether _M_flags & __flags is non-zero.
>        bool
> @@ -162,53 +182,74 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>         explicit
>         __wait_args(const _Tp* __addr, bool __bare_wait = false) noexcept
>         : __wait_args_base{ _S_flags_for(__addr, __bare_wait) }
> -       { }
> +       {
> +         _M_obj = __addr; // Might be replaced by _M_setup_wait
> +         if constexpr (__waitable<_Tp>)
> +           // __wait_impl might be able to wait directly on __addr
> +           // instead of using a proxy, depending on its size.
> +           _M_obj_size = sizeof(_Tp);
> +       }
>
>        __wait_args(const __platform_wait_t* __addr, __platform_wait_t
> __old,
>                   int __order, bool __bare_wait = false) noexcept
> -      : __wait_args_base{ _S_flags_for(__addr, __bare_wait), __order,
> __old }
> -      { }
> +      : __wait_args(__addr, __bare_wait)
> +      {
> +       _M_order = __order;
> +       _M_old = __old;
> +      }
>
>        __wait_args(const __wait_args&) noexcept = default;
>        __wait_args& operator=(const __wait_args&) noexcept = default;
>
> -      template<typename _ValFn,
> -              typename _Tp = decay_t<decltype(std::declval<_ValFn&>()())>>
> +      template<typename _Tp, typename _ValFn>
>         _Tp
> -       _M_setup_wait(const void* __addr, _ValFn __vfn,
> +       _M_setup_wait(const _Tp* __addr, _ValFn __vfn,
>                       __wait_result_type __res = {})
>         {
> +         static_assert(is_same_v<_Tp, decay_t<decltype(__vfn())>>);
> +
>           if constexpr (__platform_wait_uses_type<_Tp>)
>             {
> -             // If the wait is not proxied, the value we check when
> waiting
> -             // is the value of the atomic variable itself.
> +             // If we know for certain that this type can be waited on
> +             // efficiently using something like a futex syscall,
> +             // then we can avoid the overhead of _M_setup_wait_impl
> +             // and just load the value and store it into _M_old.
>
> -             if (__res._M_has_val) // The previous wait loaded a recent
> value.
> +             if (__res._M_has_val) // A previous wait loaded a recent
> value.
>                 {
>                   _M_old = __res._M_val;
> -                 return __builtin_bit_cast(_Tp, __res._M_val);
> +                 return (_Tp)_M_old;
>                 }
>               else // Load the value from __vfn
>                 {
> -                 _Tp __val = __vfn();
> -                 _M_old = __builtin_bit_cast(__platform_wait_t, __val);
> +                 auto __val = __vfn();
> +                 if constexpr (sizeof(_Tp) == sizeof(__UINT32_TYPE__))
> +                   _M_old = __builtin_bit_cast(__UINT32_TYPE__, __val);
> +                 else if constexpr (sizeof(_Tp) ==
> sizeof(__UINT64_TYPE__))
> +                   _M_old = __builtin_bit_cast(__UINT64_TYPE__, __val);
> +                 else
> +                   static_assert(false); // Unsupported size
>                   return __val;
>                 }
>             }
> -         else // It's a proxy wait and the proxy's _M_ver is used.
> +         else // The library will decide whether to use a proxy wait.
>             {
>               if (__res._M_has_val) // The previous wait loaded a recent
> value.
>                 _M_old = __res._M_val;
> -             else // Load _M_ver from the proxy (must happen before
> __vfn()).
> -               _M_load_proxy_wait_val(__addr);
> +             else // Let the library decide how to setup the wait.
> +               {
> +                 // Set _M_obj to the address to be waited on (either
> __addr
> +                 // or a proxy) and load its current value into _M_old.
> +                 _M_setup_wait_impl(__addr);
> +               }
>               return __vfn();
>             }
>         }
>
>      private:
> -      // Populates _M_wait_state and _M_old from the proxy for __addr.
> +      // Populates _M_wait_state and _M_old appropriately for __addr.
>        void
> -      _M_load_proxy_wait_val(const void* __addr);
> +      _M_setup_wait_impl(const void* __addr);
>
>        template<typename _Tp>
>         static constexpr __wait_flags
> @@ -218,8 +259,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>           __wait_flags __res = __abi_version | __do_spin;
>           if (!__bare_wait)
>             __res |= __track_contention;
> -         if constexpr (!__platform_wait_uses_type<_Tp>)
> -           __res |= __proxy_wait;
>           return __res;
>         }
>      };
> @@ -255,9 +294,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>                           __detail::__platform_wait_t __old,
>                           int __order, bool __bare_wait = false)
>    {
> -#ifndef _GLIBCXX_HAVE_PLATFORM_WAIT
> -    __glibcxx_assert(false); // This function can't be used for proxy
> wait.
> -#endif
> +    // This function must not be used if __wait_impl might use a proxy
> wait:
> +
> __glibcxx_assert(__platform_wait_uses_type<__detail::__platform_wait_t>);
> +
>      __detail::__wait_args __args{ __addr, __old, __order, __bare_wait };
>      // C++26 will not ignore the return value here
>      __detail::__wait_impl(__addr, __args);
> diff --git a/libstdc++-v3/src/c++20/atomic.cc
> b/libstdc++-v3/src/c++20/atomic.cc
> index e280045b619d..aeb4ea3e2466 100644
> --- a/libstdc++-v3/src/c++20/atomic.cc
> +++ b/libstdc++-v3/src/c++20/atomic.cc
> @@ -27,25 +27,18 @@
>  #if __glibcxx_atomic_wait
>  #include <atomic>
>  #include <bits/atomic_timed_wait.h>
> -#include <bits/functional_hash.h>
> -#include <cstdint>
> +#include <cstdint> // uint32_t, uint64_t
> +#include <climits> // INT_MAX
> +#include <cerrno>  // errno, ETIMEDOUT, etc.
>  #include <bits/std_mutex.h>  // std::mutex, std::__condvar
> +#include <bits/functexcept.h> // __throw_system_error
> +#include <bits/functional_hash.h>
>
>  #ifdef _GLIBCXX_HAVE_LINUX_FUTEX
> -# include <cerrno>
> -# include <climits>
>  # include <unistd.h>
>  # include <syscall.h>
> -# include <bits/functexcept.h>
> -# include <sys/time.h>
> -#endif
> -
> -#ifdef _GLIBCXX_HAVE_PLATFORM_WAIT
> -# ifndef _GLIBCXX_HAVE_PLATFORM_TIMED_WAIT
> -// __waitable_state assumes that we consistently use the same
> implementation
> -// (i.e. futex vs mutex+condvar) for timed and untimed waiting.
> -#  error "This configuration is not currently supported"
> -# endif
> +# include <sys/time.h> // timespec
> +# define _GLIBCXX_HAVE_PLATFORM_WAIT 1
>  #endif
>
>  #pragma GCC diagnostic ignored "-Wmissing-field-initializers"
> @@ -77,7 +70,7 @@ namespace
>    };
>
>    void
> -  __platform_wait(const int* __addr, int __val) noexcept
> +  __platform_wait(const int* __addr, int __val, int /* obj_size */)
> noexcept
>


For macOS the __platform_notify function also needs an obj_size parameter.
The __ulock_wake function needs different arguments depending on whether
it's waking a 32-bit or 64-bit wait.


   {
>      auto __e = syscall (SYS_futex, __addr,
>
> static_cast<int>(__futex_wait_flags::__wait_private),
> @@ -107,7 +100,7 @@ namespace
>      // Count of threads blocked waiting on this state.
>      alignas(_S_align) __platform_wait_t _M_waiters = 0;
>
> -#ifndef _GLIBCXX_HAVE_PLATFORM_TIMED_WAIT
> +#ifndef _GLIBCXX_HAVE_PLATFORM_WAIT
>      mutex _M_mtx;
>
>      // This type meets the Cpp17BasicLockable requirements.
> @@ -123,7 +116,7 @@ namespace
>      // use this for waiting and notifying functions instead.
>      alignas(_S_align) __platform_wait_t _M_ver = 0;
>
> -#ifndef _GLIBCXX_HAVE_PLATFORM_TIMED_WAIT
> +#ifndef _GLIBCXX_HAVE_PLATFORM_WAIT
>      __condvar _M_cv;
>  #endif
>
> @@ -215,18 +208,18 @@ namespace
>    __wait_result_type
>    __spin_impl(const __platform_wait_t* __addr, const __wait_args_base&
> __args)
>    {
> -    __platform_wait_t __val{};
> +    __wait_value_type wval;
>      for (auto __i = 0; __i < __atomic_spin_count; ++__i)
>        {
> -       __atomic_load(__addr, &__val, __args._M_order);
> -       if (__val != __args._M_old)
> -         return { ._M_val = __val, ._M_has_val = true, ._M_timeout =
> false };
> +       wval = __atomic_load_n(__addr, __args._M_order);
> +       if (wval != __args._M_old)
> +         return { ._M_val = wval, ._M_has_val = true, ._M_timeout = false
> };
>         if (__i < __atomic_spin_count_relax)
>           __thread_relax();
>         else
>           __thread_yield();
>        }
> -    return { ._M_val = __val, ._M_has_val = true, ._M_timeout = true };
> +    return { ._M_val = wval, ._M_has_val = true, ._M_timeout = true };
>    }
>
>    inline __waitable_state*
> @@ -237,32 +230,58 @@ namespace
>      return static_cast<__waitable_state*>(args._M_wait_state);
>    }
>
> +  [[gnu::always_inline]]
> +  inline bool
> +  use_proxy_wait(const __wait_args_base& args, const void* /* addr */)
> +  {
> +    if constexpr (__platform_wait_uses_type<uint32_t>)
> +      if (args._M_obj_size == sizeof(uint32_t))
> +       return false;
> +
> +    if constexpr (__platform_wait_uses_type<uint64_t>)
> +      if (args._M_obj_size == sizeof(uint64_t))
> +       return false;
> +
> +    // Use proxy wait for everything else:
> +    return true;
> +  }
> +
>  } // namespace
>
> -// Called for a proxy wait
> +// Set _M_wait_state if using proxy wait, or caller wants contention
> tracking.
> +// Set _M_obj to &_M_wait_state->_M_ver if using proxy wait.
> +// Load the current value from _M_obj and store in _M_val.
>  void
> -__wait_args::_M_load_proxy_wait_val(const void* addr)
> +__wait_args::_M_setup_wait_impl(const void* addr)
>  {
> -  // __glibcxx_assert( *this & __wait_flags::__proxy_wait );
> +  if (!use_proxy_wait(*this, addr))
> +    {
> +      // We can wait on this address directly.
> +      __glibcxx_assert(_M_obj == addr);
>
> -  // We always need a waitable state for proxy waits.
> +      int val;
> +      __atomic_load(static_cast<const int*>(addr), &val,
> __ATOMIC_ACQUIRE);
> +      _M_old = val;
> +
> +      return;
> +    }
> +
> +  // This will be a proxy wait, so get a waitable state.
>    auto state = set_wait_state(addr, *this);
>
> +  // The address we will wait on is the version count of the waitable
> state:
> +  _M_obj = &state->_M_ver;
> +  // __wait_impl and __wait_until_impl need to know this size:
> +  _M_obj_size = sizeof(state->_M_ver);
> +
>    // Read the value of the _M_ver counter.
> -  __atomic_load(&state->_M_ver, &_M_old, __ATOMIC_ACQUIRE);
> +  _M_old = __atomic_load_n(&state->_M_ver, __ATOMIC_ACQUIRE);
>  }
>
>  __wait_result_type
> -__wait_impl(const void* __addr, __wait_args_base& __args)
> +__wait_impl([[maybe_unused]] const void* __addr, __wait_args_base& __args)
>  {
> -  auto __state = static_cast<__waitable_state*>(__args._M_wait_state);
> -
> -  const __platform_wait_t* __wait_addr;
> -
> -  if (__args & __wait_flags::__proxy_wait)
> -    __wait_addr = &__state->_M_ver;
> -  else
> -    __wait_addr = static_cast<const __platform_wait_t*>(__addr);
> +  auto* __wait_addr = static_cast<const
> __platform_wait_t*>(__args._M_obj);
>
>    if (__args & __wait_flags::__do_spin)
>      {
> @@ -277,7 +296,7 @@ __wait_impl(const void* __addr, __wait_args_base&
> __args)
>    if (__args & __wait_flags::__track_contention)
>      set_wait_state(__addr, __args); // scoped_wait needs a
> __waitable_state
>    scoped_wait s(__args);
> -  __platform_wait(__wait_addr, __args._M_old);
> +  __platform_wait(__wait_addr, __args._M_old, __args._M_obj_size);
>    // We haven't loaded a new value so return _M_has_val=false
>    return { ._M_val = __args._M_old, ._M_has_val = false, ._M_timeout =
> false };
>  #else
> @@ -286,6 +305,7 @@ __wait_impl(const void* __addr, __wait_args_base&
> __args)
>    __atomic_load(__wait_addr, &__val, __args._M_order);
>    if (__val == __args._M_old)
>      {
> +      auto __state = static_cast<__waitable_state*>(__args._M_wait_state);
>        __state->_M_cv.wait(__state->_M_mtx);
>        return { ._M_val = __val, ._M_has_val = false, ._M_timeout = false
> };
>      }
> @@ -294,24 +314,40 @@ __wait_impl(const void* __addr, __wait_args_base&
> __args)
>  }
>
>  void
> -__notify_impl(const void* __addr, [[maybe_unused]] bool __all,
> +__notify_impl([[maybe_unused]] const void* __addr, [[maybe_unused]] bool
> __all,
>               const __wait_args_base& __args)
>  {
> -  auto __state = static_cast<__waitable_state*>(__args._M_wait_state);
> -  if (!__state)
> -    __state = &__waitable_state::_S_state_for(__addr);
> +  const bool __track_contention = __args &
> __wait_flags::__track_contention;
> +  const bool proxy_wait = use_proxy_wait(__args, __addr);
>
> -  [[maybe_unused]] const __platform_wait_t* __wait_addr;
> +  [[maybe_unused]] auto* __wait_addr
> +    = static_cast<const __platform_wait_t*>(__addr);
> +
> +#ifdef _GLIBCXX_HAVE_PLATFORM_WAIT
> +  // Check whether it would be a non-proxy wait for this object.
> +  // This condition must match the one in _M_setup_wait_impl to ensure
> that
> +  // the address used for the notify matches the one used for the wait.
> +  if (!proxy_wait)
> +    {
> +      if (__track_contention)
> +       if (!__waitable_state::_S_state_for(__addr)._M_waiting())
> +         return;
> +
> +      __platform_notify(__wait_addr, __all);
> +      return;
> +    }
> +#endif
> +
> +  // Either a proxy wait or we don't have platform wait/wake primitives.
> +
> +  auto __state = &__waitable_state::_S_state_for(__addr);
>
>    // Lock mutex so that proxied waiters cannot race with incrementing
> _M_ver
>    // and see the old value, then sleep after the increment and
> notify_all().
>    lock_guard __l{ *__state };
>
> -  if (__args & __wait_flags::__proxy_wait)
> +  if (proxy_wait)
>      {
> -      // Waiting for *__addr is actually done on the proxy's _M_ver.
> -      __wait_addr = &__state->_M_ver;
> -
>        // Increment _M_ver so that waiting threads see something changed.
>        // This has to be atomic because the load in _M_load_proxy_wait_val
>        // is done without the mutex locked.
> @@ -322,11 +358,11 @@ __notify_impl(const void* __addr, [[maybe_unused]]
> bool __all,
>        // they can re-evaluate their conditions to see if they should
>        // stop waiting or should wait again.
>        __all = true;
> -    }
> -  else // Use the atomic variable's own address.
> -    __wait_addr = static_cast<const __platform_wait_t*>(__addr);
>
> -  if (__args & __wait_flags::__track_contention)
> +      __wait_addr = &__state->_M_ver;
> +    }
> +
> +  if (__track_contention)
>      {
>        if (!__state->_M_waiting())
>         return;
> @@ -348,7 +384,8 @@ namespace
>  bool
>  __platform_wait_until(const __platform_wait_t* __addr,
>                       __platform_wait_t __old,
> -                     const __wait_clock_t::time_point& __atime) noexcept
> +                     const __wait_clock_t::time_point& __atime,
> +                     int /* obj_size */) noexcept
>  {
>    struct timespec __rt = chrono::__to_timeout_timespec(__atime);
>
> @@ -366,7 +403,7 @@ __platform_wait_until(const __platform_wait_t* __addr,
>  }
>  #endif // HAVE_LINUX_FUTEX
>
> -#ifndef _GLIBCXX_HAVE_PLATFORM_TIMED_WAIT
> +#ifndef _GLIBCXX_HAVE_PLATFORM_WAIT
>  bool
>  __cond_wait_until(__condvar& __cv, mutex& __mx,
>                   const __wait_clock_t::time_point& __atime)
> @@ -381,7 +418,7 @@ __cond_wait_until(__condvar& __cv, mutex& __mx,
>      __cv.wait_until(__mx, __ts);
>    return __wait_clock_t::now() < __atime;
>  }
> -#endif // ! HAVE_PLATFORM_TIMED_WAIT
> +#endif // ! HAVE_PLATFORM_WAIT
>
>  // Unlike __spin_impl, does not always return _M_has_val == true.
>  // If the deadline has already passed then no fresh value is loaded.
> @@ -414,7 +451,7 @@ __spin_until_impl(const __platform_wait_t* __addr,
>             return __res;
>         }
>
> -      __atomic_load(__addr, &__res._M_val, __args._M_order);
> +      __res._M_val = __atomic_load_n(__addr, __args._M_order);
>        __res._M_has_val = true;
>        if (__res._M_val != __args._M_old)
>         {
> @@ -428,16 +465,11 @@ __spin_until_impl(const __platform_wait_t* __addr,
>  } // namespace
>
>  __wait_result_type
> -__wait_until_impl(const void* __addr, __wait_args_base& __args,
> +__wait_until_impl([[maybe_unused]] const void* __addr, __wait_args_base&
> __args,
>                   const __wait_clock_t::duration& __time)
>  {
>    const __wait_clock_t::time_point __atime(__time);
> -  auto __state = static_cast<__waitable_state*>(__args._M_wait_state);
> -  const __platform_wait_t* __wait_addr;
> -  if (__args & __wait_flags::__proxy_wait)
> -    __wait_addr = &__state->_M_ver;
> -  else
> -    __wait_addr = static_cast<const __platform_wait_t*>(__addr);
> +  auto* __wait_addr = static_cast<const
> __platform_wait_t*>(__args._M_obj);
>
>    if (__args & __wait_flags::__do_spin)
>      {
> @@ -448,11 +480,12 @@ __wait_until_impl(const void* __addr,
> __wait_args_base& __args,
>         return __res;
>      }
>
> -#ifdef _GLIBCXX_HAVE_PLATFORM_TIMED_WAIT
> +#ifdef _GLIBCXX_HAVE_PLATFORM_WAIT
>    if (__args & __wait_flags::__track_contention)
> -    set_wait_state(__addr, __args);
> +    set_wait_state(__addr, __args); // scoped_wait needs a
> __waitable_state
>    scoped_wait s(__args);
> -  bool timeout = !__platform_wait_until(__wait_addr, __args._M_old,
> __atime);
> +  bool timeout = !__platform_wait_until(__wait_addr, __args._M_old,
> __atime,
> +                                       __args._M_obj_size);
>    return { ._M_val = __args._M_old, ._M_has_val = false, ._M_timeout =
> timeout };
>  #else
>    waiter_lock l(__args);
> @@ -460,6 +493,7 @@ __wait_until_impl(const void* __addr,
> __wait_args_base& __args,
>    __atomic_load(__wait_addr, &__val, __args._M_order);
>    if (__val == __args._M_old)
>      {
> +      auto __state = static_cast<__waitable_state*>(__args._M_wait_state);
>        bool timeout = !__cond_wait_until(__state->_M_cv, __state->_M_mtx,
> __atime);
>        return { ._M_val = __val, ._M_has_val = false, ._M_timeout =
> timeout };
>      }
> --
> 2.51.1
>
>

Reply via email to