Re: [PATCH] libstdc++: Fix spinloop in atomic timed waiting function [PR122878]

Jonathan Wakely Thu, 27 Nov 2025 08:55:17 -0800

On Thu, 27 Nov 2025 at 16:01, Tomasz Kaminski <[email protected]> wrote:
>
>
>
> On Thu, Nov 27, 2025 at 4:35 PM Jonathan Wakely <[email protected]> wrote:
>>
>> The __spin_until_impl function was presumably intended to just spin for
>> a short time, then give up and let the caller wait on a futex or
>> condvar. However, __spin_until_impl will never stop spinning unless
>> either the value changes or the timeout is reached. This means that when
>> __spin_until_impl returns, the caller should immediately return (because
>> either the value we were waiting for has changed, or the timeout
>> happened). So __wait_until_impl should never block on a futex or
>> condvar. However, the check for the return value of __spin_until_impl
>> would only return if the value changed (i.e. !__res._M_timeout). So if
>> the timeout occurred, it would fall through and block on the
>> futex/condvar, even though the timeout has already been reached.
>
> Yes, it was calling sleep for max 64ms in iterations, and started spinning
> close to timeout. Which is strange, because I would assume spin loop is
> beneficial when we have nothing to wait on.
>>
>>
>> This was causing a major performance regression in the timed waiting
>> functions of std::counting_semaphore.
>>
>> The simplest fix is to replace __spin_until_impl entirely, just calling
>> __spin_impl to spin a small, finite number of times, and then return
>> immediately if either the value changed or the timeout happened. This
>> ensures that we don't block on the futex/condvar unnecessarily.
>>
>> Removing __spin_until_impl also has the advantage that we no longer keep
>> calling steady_clock::now() on every iteration to check for a timeout.
>> That was also adding significant overhead to the timed waiting
>> functions.
>
> LGTM.
>>
>>
>> libstdc++-v3/ChangeLog:
>>
>>         PR libstdc++/122878
>>         * src/c++20/atomic.cc (__spin_until_impl): Remove.
>>         (__wait_until_impl): Use __spin_impl instead of
>>         __spin_until_impl and return if timeout is reached after
>>         spinning.
>> ---
>>
>> Tested x86_64-linux.
>>
>>  libstdc++-v3/src/c++20/atomic.cc | 47 ++------------------------------
>>  1 file changed, 3 insertions(+), 44 deletions(-)
>>
>> diff --git a/libstdc++-v3/src/c++20/atomic.cc 
>> b/libstdc++-v3/src/c++20/atomic.cc
>> index fdd67d834768..16fd91b7d7ab 100644
>> --- a/libstdc++-v3/src/c++20/atomic.cc
>> +++ b/libstdc++-v3/src/c++20/atomic.cc
>> @@ -455,49 +455,6 @@ __cond_wait_until(__condvar& __cv, mutex& __mx,
>>    return __wait_clock_t::now() < __atime;
>>  }
>>  #endif // ! HAVE_PLATFORM_WAIT
>> -
>> -// Unlike __spin_impl, does not always return _M_has_val == true.
>> -// If the deadline has already passed then no fresh value is loaded.
>> -__wait_result_type
>> -__spin_until_impl(const __platform_wait_t* __addr,
>> -                 const __wait_args_base& __args,
>> -                 const __wait_clock_t::time_point& __deadline)
>> -{
>> -  using namespace literals::chrono_literals;
>> -
>> -  __wait_result_type __res{};
>> -  auto __t0 = __wait_clock_t::now();
>> -  auto __now = __t0;
>> -  for (; __now < __deadline; __now = __wait_clock_t::now())
>> -    {
>> -      auto __elapsed = __now - __t0;
>> -#ifndef _GLIBCXX_NO_SLEEP
>> -      if (__elapsed > 128ms)
>> -       this_thread::sleep_for(64ms);
>> -      else if (__elapsed > 64us)
>> -       this_thread::sleep_for(__elapsed / 2);
>> -      else
>> -#endif
>> -      if (__elapsed > 4us)
>> -       __thread_yield();
>> -      else
>> -       {
>> -         __res = __detail::__spin_impl(__addr, __args);
>> -         if (!__res._M_timeout)
>> -           return __res;
>> -       }
>> -
>> -      __res._M_val = __atomic_load_n(__addr, __args._M_order);
>> -      __res._M_has_val = true;
>> -      if (__res._M_val != __args._M_old)
>> -       {
>> -         __res._M_timeout = false;
>> -         return __res;
>> -       }
>> -    }
>> -  __res._M_timeout = true;
>> -  return __res;
>> -}
>>  } // namespace
>>
>>  __wait_result_type
>> @@ -509,11 +466,13 @@ __wait_until_impl([[maybe_unused]] const void* __addr, 
>> __wait_args_base& __args,
>
> Few lines before, we are passing wait_clock_t::duration with the time since 
> epoch value here,
> wouldn't it make more sense and make the implementation cleaner if we would 
> pass wait_clock_t::time_point instead?
> We have different types for duration and time_point for a reason.


I should add a comment about that. Passing a duration instead of a
time_point means that the choice of clock isn't baked in to the ABI.
We could use the same entry point into the library to pass a
system_clock::time_point if we assigned a new bit in the __wait_flags
structure to mean it uses the system_clock not the steady_clock. Or we
could use a flag to say that it's a relative time (as a duration) not
an absolute time as a time_point.

So you're correct that we're using the "wrong" type here, but it's a
form of type erasure (erasing the clock parameter) that makes the API
more flexible.

>
> One of two calls points have time_point and then turns it into duration:
>         auto __res = __detail::__wait_until_impl(__addr, __args,
>                                                  __at.time_since_epoch());
> And construct time point back in function. But this could be done as separate 
> patch.
>
>>
>>
>>    if (__args & __wait_flags::__do_spin)
>>      {
>> -      auto __res = __detail::__spin_until_impl(__wait_addr, __args, 
>> __atime);
>> +      auto __res = __detail::__spin_impl(__wait_addr, __args);
>>        if (!__res._M_timeout)
>>         return __res;
>>        if (__args & __wait_flags::__spin_only)
>>         return __res;
>> +      if (__wait_clock_t::now() >= __atime)
>> +       return __res;
>>      }
>>
>>  #ifdef _GLIBCXX_HAVE_PLATFORM_WAIT
>> --
>> 2.51.1
>>

Re: [PATCH] libstdc++: Fix spinloop in atomic timed waiting function [PR122878]

Reply via email to