https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88947
--- Comment #8 from Jonathan Wakely <redi at gcc dot gnu.org> ---
There are a few optimizations we should implement in the executor (which won't
require ABI changes to the actual NFA structure).
For regex_search with ^ we should fail fast if it doesn't match at the start of
the input.
For regex search with a pattern that starts with a literal char (and icase
isn't in the syntax options) we should use std::find to find the first
occurrence of that char in the input. That will be much faster than applying
the regex matcher to each character of the input. For example, when matching
"abcdef" with the regex "de" we should skip the start of the input and try to
match "def", as that will be much faster than checking "abcdef" against the
regex, then checking "bcdef" and then checking "cdef".
For regex_match with a pattern that ends in an atom that is not a repeating
group or optional group, we could try to match at the end of the input first.
That would fail fast for some of the pathological examples like
'(.*){200}{100}aaa' where we do ridiculous backtracking. If the input doesn't
end with 'a' then there's no point even starting. No amount of backtracking
will succeed.
We do need to be careful about syntax options like multiline (which allows ^ to
match in the middle of input like "abc\ndef") and match_not_bol (which means
that ^ should not match the start of the input).