Tested on x86_64-pc-linux-gnu, does this look OK for trunk?
This is a safer alternative to the [[gnu::flatten]] patch and
provides a majority of the speedup of that approach.
-- >8 --
The compiler understandably doesn't know that _M_node only ever has a
single call site, _M_dfs, (and is not directly called from other library
headers or even user code) and so decides not to inline it. So use the
always_inline attribute to tell the compiler to inline it anyway. This
seems sufficient to make all _Executor subroutines get inlined away into
_M_dfs, and speeds up the executor by 30% according to some microbenchmarks.
libstdc++-v3/ChangeLog:
* include/bits/regex_executor.tcc (__detail::_Executor::_M_node)
[__OPTIMIZE__]: Add [[gnu::always_inline]] attribute. Declare
inline.
---
libstdc++-v3/include/bits/regex_executor.tcc | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/libstdc++-v3/include/bits/regex_executor.tcc
b/libstdc++-v3/include/bits/regex_executor.tcc
index ccdec934b49c..3412ad683e46 100644
--- a/libstdc++-v3/include/bits/regex_executor.tcc
+++ b/libstdc++-v3/include/bits/regex_executor.tcc
@@ -578,7 +578,10 @@ _GLIBCXX_BEGIN_INLINE_ABI_NAMESPACE(_V2)
template<typename _BiIter, typename _Alloc, typename _TraitsT,
bool __dfs_mode>
- void _Executor<_BiIter, _Alloc, _TraitsT, __dfs_mode>::
+#ifdef __OPTIMIZE__
+ [[__gnu__::__always_inline__]]
+#endif
+ inline void _Executor<_BiIter, _Alloc, _TraitsT, __dfs_mode>::
_M_node(_Match_mode __match_mode, _StateIdT __i)
{
if (_M_states._M_visited(__i))
--
2.53.0.rc1.65.gea24e2c554