On Fri, 30 Jan 2026 at 17:53, Patrick Palka <[email protected]> wrote:
>
> Tested on x86_64-pc-linux-gnu, does this look OK for trunk?
> This is a safer alternative to the [[gnu::flatten]] patch and
> provides a majority of the speedup of that approach.
>
> -- >8 --
>
> The compiler understandably doesn't know that _M_node only ever has a
> single call site, _M_dfs, (and is not directly called from other library
> headers or even user code) and so decides not to inline it. So use the
> always_inline attribute to tell the compiler to inline it anyway. This
> seems sufficient to make all _Executor subroutines get inlined away into
> _M_dfs, and speeds up the executor by 30% according to some microbenchmarks.
OK
>
> libstdc++-v3/ChangeLog:
>
> * include/bits/regex_executor.tcc (__detail::_Executor::_M_node)
> [__OPTIMIZE__]: Add [[gnu::always_inline]] attribute. Declare
> inline.
> ---
> libstdc++-v3/include/bits/regex_executor.tcc | 5 ++++-
> 1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/libstdc++-v3/include/bits/regex_executor.tcc
> b/libstdc++-v3/include/bits/regex_executor.tcc
> index ccdec934b49c..3412ad683e46 100644
> --- a/libstdc++-v3/include/bits/regex_executor.tcc
> +++ b/libstdc++-v3/include/bits/regex_executor.tcc
> @@ -578,7 +578,10 @@ _GLIBCXX_BEGIN_INLINE_ABI_NAMESPACE(_V2)
>
> template<typename _BiIter, typename _Alloc, typename _TraitsT,
> bool __dfs_mode>
> - void _Executor<_BiIter, _Alloc, _TraitsT, __dfs_mode>::
> +#ifdef __OPTIMIZE__
> + [[__gnu__::__always_inline__]]
> +#endif
> + inline void _Executor<_BiIter, _Alloc, _TraitsT, __dfs_mode>::
> _M_node(_Match_mode __match_mode, _StateIdT __i)
> {
> if (_M_states._M_visited(__i))
> --
> 2.53.0.rc1.65.gea24e2c554
>