On Fri, 30 Jan 2026, Patrick Palka wrote:
> On Fri, 30 Jan 2026, Jonathan Wakely wrote:
>
> > On Fri, 30 Jan 2026 at 15:25, Patrick Palka <[email protected]> wrote:
> > >
> > > Changes in v8:
> > > - Use char instead of uint8_t
> > > - Decompose the first byte of _ExecutorFrame::_M_bytes into an
> > > anonymous union with code-specific members
> > > - Use [[__fallthrough__]] instead of // fallthrough
> > >
> > > Changes in v7:
> > > - Make fallback_next and fallback_rep_once_more code paths fall through
> > > to the next / rep_once_more code paths instead of pushing a frame
> > > that gets immediately popped.
> > >
> > > Changes in v6:
> > > - Use an inline namespace to give the _Executor class a different
> > > mangled name since its implementation is no longer compatible with
> > > the previous recursive one
> > > - Explicitly cast _M_subexpr to _StateIdT in _M_handle_subexpr_begin/end
> > > to self-document that what we're storing in _ExecutorFrame::_M_state_id
> > > isn't really a state id
> > > - I experimented with using std::deque instead of std::vector for the
> > > frames stack.
> > > Upsides:
> > > - Reduces peak memory use from 50MB to 33MB for the microbenchmark
> > > regex_match(string(200000, 'a'), "(a|b|c)*")
> > > due to less unused capacity (vector's 2x growth factor means that
> > > 50% of the time there will be at least 25% unused capacity)
> > > - Seems to be as fast as vector
> > > - Automatically reclaims unused capacity as the stack unwinds
> > > - No expensive reallocation needed when growing the stack in the
> > > case of non-trivially-copyable input iterators
> > > Downsides:
> > > - deque will allocate/free a chunk of memory whenever crossing a
> > > chunk boundary (512 elements) in either direction, which could
> > > degrade performance in rare cases where our stack size hovers
> > > near a multiple of the chunk boundary.
> > > - the fixed chunk size means small inputs will use at least
> > > 512*24 bytes=12kB of memory
> > > Ultimately it seems using std::deque is better for very large
> > > inputs, and for small inputs std::vector is slightly better. I'm
> > > not sure which container to go with, for now I stuck with std::vector.
> > > (N.B. we seem to be 2x faster than libc++'s std::regex and use 60%
> > > less memory for the above microbenchmark)
> > >
> > > Changes in v5:
> > > - Replace char _ExecutorFrame::_M_char with
> > > uint8_t _ExecutorFrame::_M_bytes[7] to fully account for
> > > the implicit struct padding and make it char size/signedness
> > > agnostic.
> > >
> > > Changes in v4:
> > > - Do not remove the BFS executor. I convinced myself that it's
> > > worthwhile to keep it around for backwards compatibility and its
> > > better time complexity (e.g. for PR78276).
> > > - However, this means the mangling of _Executor is now the same as
> > > before (since the __dfs_mode template parameter is no longer gone).
> > > I believe we need to give this class a different mangling so
> > > programs that mix old/new std::regex impls continue to work.
> > >
> > > Changes in v3:
> > >
> > > - Add char _M_char member to _ExecutorFrame (without increasing the
> > > size of struct thanks to padding)
> > > - Use it to fix bug in restore_subexpr_end frame code whereby we
> > > weren't restoring _M_cur_results.matched properly
> > > - Combine restore_subexpr_begin/end frame codes into one
> > > restore_cur_results frame code
> > > - Combine restore_rep_count_val/posn frame codes into single
> > > restore_rep_count frame_code (this makes the patch 3/4 from v2
> > > obsolete)
> > > - Store the frame stack in the data member _ExecutorFrame::_M_frames
> > > alongside the rest of the executor state instead of using a local.
> > > Using a local variable is just more code with no real benefit.
> > >
> > > -- >8 --
> > >
> > > libstdc++/regex: Convert DFS executor to use iteration [PR86164]
> > >
> > > This replaces the recursion and implicit system stack usage of the DFS
> > > executor with iteration and an explicit heap-based stack. After this
> > > patch, we no longer stack overflow on the PR86164 testcases since system
> > > stack usage is no longer linear with respect to input size. This should
> > > otherwise be a non-functional change.
> > >
> > > PR libstdc++/86164
> > >
> > > libstdc++-v3/ChangeLog:
> > >
> > > * include/bits/regex.h (__detail::_Executor): Use inline
> > > namespace _V2.
> > > * include/bits/regex_executor.h (__detail::_ExecutorFrame):
> > > Declare.
> > > (__detail::_Executor): Use inline namespace _V2.
> > > (__detail::_Executor::_M_node): Declare.
> > > (__detail::_Executor::_M_frames): New data member.
> > > * include/bits/regex_executor.tcc
> > > (__detail::_ExecutorFrameOpcode):
> > > New.
> > > (__detail::_ExecutorFrame): New.
> > > (__detail::_Executor): Use inline namespace _V2.
> > > (__detail::_Executor::_M_rep_once_more): Replace recursive
> > > _M_dfs calls with an _S_opcode_next frame push, and any tail work
> > > after such calls with an appropriate frame push.
> > > (__detail::_M_handle_repeat): Likewise.
> > > (__detail::_M_handle_subexpr_begin): Likewise.
> > > (__detail::_M_handle_subexpr_end): Likewise.
> > > (__detail::_M_handle_line_begin_assertion): Likewise.
> > > (__detail::_M_handle_line_end_assertion): Likewise.
> > > (__detail::_M_handle_word_boundary): Likewise.
> > > (__detail::_M_handle_subexpr_lookahead): Likewise.
> > > (__detail::_M_handle_match): Likewise.
> > > (__detail::_M_handle_backref): Likewise.
> > > (__detail::_M_handle_accept): Likewise.
> > > (__detail::_M_handle_alternative): Likewise.
> > > (__detail::_M_node): Factored out from _M_dfs.
> > > (__detail::_M_dfs): Push an initial frame to _M_frames that
> > > visits the starting node and pass this stack each subroutine.
> > > Pop the latest _ExecutorFrame from _M_frames and handle
> > > appropriately according to its _ExecutorFrameOpcode. Loop until
> > > _M_frames is empty.
> > > * include/std/regex: Include <cstdint>.
> > > ---
> > > libstdc++-v3/include/bits/regex.h | 2 +
> > > libstdc++-v3/include/bits/regex_executor.h | 9 +
> > > libstdc++-v3/include/bits/regex_executor.tcc | 233 +++++++++++++++----
> > > 3 files changed, 199 insertions(+), 45 deletions(-)
> > >
> > > diff --git a/libstdc++-v3/include/bits/regex.h
> > > b/libstdc++-v3/include/bits/regex.h
> > > index 68fb6254f362..5bcb9a447ce2 100644
> > > --- a/libstdc++-v3/include/bits/regex.h
> > > +++ b/libstdc++-v3/include/bits/regex.h
> > > @@ -61,8 +61,10 @@ namespace __detail
> > > _RegexExecutorPolicy __policy,
> > > bool __match_mode);
> > >
> > > +_GLIBCXX_BEGIN_INLINE_ABI_NAMESPACE(_V2)
> > > template<typename, typename, typename, bool>
> > > class _Executor;
> > > +_GLIBCXX_END_INLINE_ABI_NAMESPACE(_V2)
> > >
> > > template<typename _Tp>
> > > struct __is_contiguous_iter : false_type { };
> > > diff --git a/libstdc++-v3/include/bits/regex_executor.h
> > > b/libstdc++-v3/include/bits/regex_executor.h
> > > index f9857bfc4c1d..94048e1f632c 100644
> > > --- a/libstdc++-v3/include/bits/regex_executor.h
> > > +++ b/libstdc++-v3/include/bits/regex_executor.h
> > > @@ -36,11 +36,15 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> > >
> > > namespace __detail
> > > {
> > > +_GLIBCXX_BEGIN_INLINE_ABI_NAMESPACE(_V2)
> > > /**
> > > * @addtogroup regex-detail
> > > * @{
> > > */
> > >
> > > + template<typename _BiIter, bool _Trivial =
> > > is_trivially_copyable<_BiIter>::value>
> > > + struct _ExecutorFrame;
> > > +
> > > /**
> > > * @brief Takes a regex and an input string and does the matching.
> > > *
> > > @@ -142,6 +146,9 @@ namespace __detail
> > > void
> > > _M_handle_alternative(_Match_mode, _StateIdT);
> > >
> > > + void
> > > + _M_node(_Match_mode, _StateIdT);
> > > +
> > > void
> > > _M_dfs(_Match_mode __match_mode, _StateIdT __start);
> > >
> > > @@ -290,6 +297,7 @@ namespace __detail
> > > };
> > >
> > > public:
> > > + _GLIBCXX_STD_C::vector<_ExecutorFrame<_BiIter>> _M_frames;
> > > _ResultsVec
> > > _M_cur_results;
> > > _BiIter _M_current;
> > > _BiIter _M_begin;
> > > @@ -305,6 +313,7 @@ namespace __detail
> > > };
> > >
> > > ///@} regex-detail
> > > +_GLIBCXX_END_INLINE_ABI_NAMESPACE(_V2)
> > > } // namespace __detail
> > > _GLIBCXX_END_NAMESPACE_VERSION
> > > } // namespace std
> > > diff --git a/libstdc++-v3/include/bits/regex_executor.tcc
> > > b/libstdc++-v3/include/bits/regex_executor.tcc
> > > index 421e585f39d9..36c7e05dcdfb 100644
> > > --- a/libstdc++-v3/include/bits/regex_executor.tcc
> > > +++ b/libstdc++-v3/include/bits/regex_executor.tcc
> > > @@ -36,6 +36,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> > > #pragma GCC diagnostic ignored "-Wc++17-extensions" // if constexpr
> > > namespace __detail
> > > {
> > > +_GLIBCXX_BEGIN_INLINE_ABI_NAMESPACE(_V2)
> > > template<typename _BiIter, typename _Alloc, typename _TraitsT,
> > > bool __dfs_mode>
> > > bool _Executor<_BiIter, _Alloc, _TraitsT, __dfs_mode>::
> > > @@ -55,6 +56,85 @@ namespace __detail
> > > return false;
> > > }
> > >
> > > + enum _ExecutorFrameOpcode : unsigned char
> > > + {
> > > + _S_fopcode_next,
> > > + _S_fopcode_fallback_next,
> > > + _S_fopcode_rep_once_more,
> > > + _S_fopcode_fallback_rep_once_more,
> > > + _S_fopcode_posix_alternative,
> > > + _S_fopcode_merge_sol,
> > > + _S_fopcode_restore_current,
> > > + _S_fopcode_restore_cur_results,
> > > + _S_fopcode_restore_rep_count,
> > > + _S_fopcode_decrement_rep_count,
> > > + };
> > > +
> > > +#define _GLIBCXX_EXECUTOR_FRAME_BYTES_LAYOUT \
> > > + union { \
> > > + unsigned char _M_byte0; \
> > > + struct { /* rep_count frame */ \
> > > + unsigned char _M_count : 2; \
> > > + }; \
> > > + struct { /* restore_cur_results frame */ \
> > > + unsigned char _M_end : 1; \
> > > + unsigned char _M_matched : 1; \
> > > + }; \
> > > + }; \
> > > + unsigned char _M_bytes[6]
> >
> > Ah, it didn't occur to me that repacing _M_bytes[7] with this union
> > would need to be spammed out in two places, sorry!
> >
> > This patch is OK for trunk, thanks!
> >
> > Maybe as a follow-up we can introduce:
> >
> > struct _ExecutorFrameBase
> > {
> > _ExecutorFrameOpcode _M_op;
> > ... the union ...
> > unsigned char _M_bytes[6];
> > _StateIdT _M_state_id;
> > };
> >
> > for the parts that are common to the primary template and the partial
> > specialization, and then make them both derive from that.
>
> D'oh, using a base class is much cleaner! I'd prefer making this change now
> rather than as a follow-up. Like so? The second patch is effectively
> unchanged.
>
> Not only is it cleaner, using a base class enables us to store frames such
> as _S_fopcode_next that don't need the extra _M_val/_M_pos space on a separate
> stack of _ExecutorFrameBase objects which would help save space. I'll
> experiment
> with that.
This optimization (essentially array-of-struct -> struct-of-array) has
zero benefit for ECMAScript + DFS mode because the "bare" frames (that
don't need the _M_val/_M_pos space) are always at the top of the stack
and get immediately popped. So the separate stack of bare frames is
always size 1. I haven't measured how beneficial it is in other modes
because it seems not worth the complexity if the default and by far most
commont mode doesn't benefit from it.