The flatten attribute on a function tells the compiler to recursively
inline all calls within it. The DFS entryoint _M_dfs, now that it's
non-recursive, is a good candidate for this attribute because its
implementation spans multiple subroutines that are only called from
_M_dfs, and so should obviously be inlined, but the compiler doesn't
know this (specifically it doesn't know that the subroutines are used
in the same way by every TU).
We could instead annotate each of its subroutines with always_inline,
but this forces the function to get inlined even with -O0, which we
don't really want here, whereas flatten only has an effect when
compiling with optimizations AFAICT. It also has the benefit of
inlining all the repetitive std::vector operations which themselves
are quite hot.
So this patch adds flatten to _M_dfs. For the microbenchmark
for (int i = 0; i < 10000; i++)
std::regex_match(std::string(200, 'a'), std::regex("(a|b|c)*"));
(with both regex_match and _M_dfs additionally marked noinline to
prevent overfitting) this reduces run time by 50% and code size by 15%
with -O2. With -Os, run time is reduced by 30% and code size by 20%.
(The main potential downside of flatten, as I understand it, is increased
code size but here we get overall smaller code since the aggressive
inlining exposes further simplifications.)
libstdc++-v3/ChangeLog:
* include/bits/regex_executor.tcc (__detail::_Executor:_M_dfs):
Add [[gnu::flatten]] attribute.
---
libstdc++-v3/include/bits/regex_executor.tcc | 1 +
1 file changed, 1 insertion(+)
diff --git a/libstdc++-v3/include/bits/regex_executor.tcc
b/libstdc++-v3/include/bits/regex_executor.tcc
index 0d45c2b963a5..b0238945afb7 100644
--- a/libstdc++-v3/include/bits/regex_executor.tcc
+++ b/libstdc++-v3/include/bits/regex_executor.tcc
@@ -604,6 +604,7 @@ _GLIBCXX_BEGIN_INLINE_ABI_NAMESPACE(_V2)
template<typename _BiIter, typename _Alloc, typename _TraitsT,
bool __dfs_mode>
+ [[__gnu__::__flatten__]]
void _Executor<_BiIter, _Alloc, _TraitsT, __dfs_mode>::
_M_dfs(_Match_mode __match_mode, _StateIdT __start)
{
--
2.53.0.rc1.65.gea24e2c554