On Mon, May 5, 2025, 11:48 AM Cassio Neri <cassio.n...@gmail.com> wrote:

> Use a better algorithm to calculate n % 7, taking into account that the
> divisor
> is a Mersenne number. This is explained in this two-part lightning talk at
> CppCon 2024 [1, 2].
>
> Roughly speaking, the algorithm is used for n = sys_days but it has a
> precondition that doesn't hold for all values of sys_days. However, it
> does hold
> for every value coming out of year_month_day::operator sys_days().
>
> Nevertheless, there's an issue because the conversion is done in two steps:
> year_month_day -> sys_days -> weekday and in the second step the
> information
> that the input sys_days was obtained from the first step (so the
> precondition
> holds) is lost. In the talk a single function performs the two steps and
> uses
> the optimization. Unless the standard adds a similar function (e.g. a
> weekday
> constructor) or this is added as an extension (not done here), this is not
> an
> option for libstdc++.
>
> The issue is adressed in the following way. At the end of the fist step,
> the
> code informs the compiler, through an [[assume]] attribute that the
> precondition
> holds. Then, at the beginning of the second step, the code checks, through
> __builtin_constant_p, if the precondition holds, in which case, it uses the
> better algorithm or, otherwise, it falls back to n % 7.
>

I suspect we want to disable this for -Os . And plus i am not 100%
convinced it is best for all micro-architures. Especially on say aarch64.
Can you do more benchmarking and peocide which exaxt core is being used?
And mention the size difference too?

Plus gcc knows how to do %7 via multiplication is that being used or is it
due to generic x86 tuning it is using the div instruction?

Thanks,
Andrew



> The idea is illustrated in [3] which compares code generation (with and
> without
> the optimization) and provides an exhaustive test of correctness. A
> benchmark is
> shown in [4].
>
> References:
>
> [1] https://youtu.be/64mTEXnSnZs
> [2] https://youtu.be/bnVkWEjRNeI
> [3] https://godbolt.org/z/E14asPK8r
> [4] https://quick-bench.com/q/vzxlrLaEccner3h-ahIrbB0BZvE
>
> libstdc++-v3/ChangeLog:
>
>         * include/std/chrono: Add postcondition to
> year_month_date::operator
>         sys_days and check this condition in weekday::_S_from_days.
> ---
> The approach might be somewhat controversial and might need to be
> discussed. The problem is that passing information from [[assume]] to
> __builtin_constant_p is
> quite fragile in 3 different ways.
>
> 1) The distance in source code between [[assume]] and __builtin_constant_p
> makes
> hard to understand why they are there in the first place. A reader who
> doesn't
> know the full picture might feel tempted to remove either of these lines.
>
> 2) Logically, the post-condition expressed by [[assume]] only needs to
> imply the
> condition tested by __builtin_constant_p. However, even whey they are
> obviously
> equivalent for a human, this might not be the case for the compiler. Even
> slight cosmetic changes might break the compiler's chain of thought and
> thus,
> the optimization.
>
> 3) Similarly to 2, changes in the compiler code that handles [[assume]] or
> __builtin_constant_p might break the optimization.
>
> If any of the issues above happens, the result will still be right but a
> performance regression will appear. Comments can mitigate the issues but a
> performance test or a test for code generation would be more effective to
> prevent performance regressions. Unfortunately, I don't know how either of
> these
> can be done.
>
>  libstdc++-v3/include/std/chrono | 27 +++++++++++++++++++++++++--
>  1 file changed, 25 insertions(+), 2 deletions(-)
>
> diff --git a/libstdc++-v3/include/std/chrono
> b/libstdc++-v3/include/std/chrono
> index 8eb9fd9baac..e762dd46cc4 100644
> --- a/libstdc++-v3/include/std/chrono
> +++ b/libstdc++-v3/include/std/chrono
> @@ -984,7 +984,20 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>        static constexpr weekday
>        _S_from_days(const days& __d)
>        {
> -       return weekday{__detail::__add_modulo<7>(4, __d.count())};
> +       const auto __days = __d.count();
> +
> +       const auto __n = static_cast<unsigned>(__days) + 12'687'434u;
> +       const auto __use_fast_mod7 = __n < 178'956'973u;
> +       // This condition is true when __d = __dp.time_since_epoch() where
> __dp
> +       // comes from year_month_day::operator sys_days().
> +       if (__builtin_constant_p(__use_fast_mod7))
> +       {
> +         const auto __wd = 613'566'757u * __n >> 29;
> +         [[assume(__wd != 7)]];
> +         return weekday{__wd};
> +       }
> +
> +       return weekday{__detail::__add_modulo<7>(4, __days)};
>        }
>
>      public:
> @@ -1652,7 +1665,17 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>
>        constexpr
>        operator sys_days() const noexcept
> -      { return sys_days{_M_days_since_epoch()}; }
> +      {
> +       const auto __days = _M_days_since_epoch();
> +
> +       // The assume attribute below allows weekday::_S_from_days to use a
> +       // faster algorithm for modulus 7.
> +       const auto __n = static_cast<unsigned>(__days.count()) +
> 12'687'434u;
> +       const auto __use_fast_mod7 = __n < 178'956'973u;
> +       [[assume(__use_fast_mod7)]];
> +
> +       return sys_days{__days};
> +      }
>
>        explicit constexpr
>        operator local_days() const noexcept
> --
> 2.49.0
>
>

Reply via email to