Tested on x86_64-pc-linux-gnu, does this look OK for trunk?
This is a safer alternative to the [[gnu::flatten]] patch and
provides a majority of the speedup of that approach.

-- >8 --

The compiler understandably doesn't know that _M_node only ever has a
single call site, _M_dfs, (and is not directly called from other library
headers or even user code) and so decides not to inline it.  So use the
always_inline attribute to tell the compiler to inline it anyway.  This
seems sufficient to make all _Executor subroutines get inlined away into
_M_dfs, and speeds up the executor by 30% according to some microbenchmarks.

libstdc++-v3/ChangeLog:

        * include/bits/regex_executor.tcc (__detail::_Executor::_M_node)
        [__OPTIMIZE__]: Add [[gnu::always_inline]] attribute.  Declare
        inline.
---
 libstdc++-v3/include/bits/regex_executor.tcc | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/libstdc++-v3/include/bits/regex_executor.tcc 
b/libstdc++-v3/include/bits/regex_executor.tcc
index ccdec934b49c..3412ad683e46 100644
--- a/libstdc++-v3/include/bits/regex_executor.tcc
+++ b/libstdc++-v3/include/bits/regex_executor.tcc
@@ -578,7 +578,10 @@ _GLIBCXX_BEGIN_INLINE_ABI_NAMESPACE(_V2)
 
   template<typename _BiIter, typename _Alloc, typename _TraitsT,
           bool __dfs_mode>
-    void _Executor<_BiIter, _Alloc, _TraitsT, __dfs_mode>::
+#ifdef __OPTIMIZE__
+    [[__gnu__::__always_inline__]]
+#endif
+    inline void _Executor<_BiIter, _Alloc, _TraitsT, __dfs_mode>::
     _M_node(_Match_mode __match_mode, _StateIdT __i)
     {
       if (_M_states._M_visited(__i))
-- 
2.53.0.rc1.65.gea24e2c554

Reply via email to