> On Jan 15, 2019, at 10:09 AM, Walt Karas <wka...@oath.com.INVALID> wrote:
>
> When we remap we go through the list of regexs looking for a match.
> What if we took all the regexes, removed subpattern captures, and
> created one giant pattern with alternation and capture of the regex
> for each rule, like:
>
> (regex1)|(regex2)|...|(regexN)
>
> We could then just step through the array of subpattern capture
> offsets to find the matching rule. Then do a second match against
> just the original rule regex to get any captures it contains. Would
> this work, and would it be faster?
That could potentially be a lot slower, no? Imagine you have a remap today with
regex_map <regex1> ...
regex_map <regex2> ...
regex_map <regex3> ...
regex_map <regex4> …
If you know what you are doing, and have some luck, you could order these such
as the most common regex is matched early. And first match wins, and
therefore, you could stop evaluating regexes early. Even in an average case,
you’d only evaluate half of the regexes, right?
But if you combine all these into one regex, you essentially have to evaluate
all regexes all the time, regardless of order and regardless if something has
matched.
— Leif