Thanks for the explanations of the dfa vs. regex. > > I think I have an obligation at this point to mention: > > > > http://swtch.com/~rsc/regexp/ > > > > In particular, there is code there for an "Efficient (non-backtracking) > > NFA implementation with submatch tracking. Accepts UTF-8 and > > wide-character Unicode input. Traditional egrep syntax only. Written by > > Rob Pike." > > > > Perhaps this can serve as the basis for a new unified matcher? > > I think this is very much related to the algorithms already in use by > regex. A unified matcher will always be slower than DFA.
I understood that regex did backtracking - these algorithms, based on the papers by RSC, are not used by the GNU library. It's worth a careful review. But this is a long term issue in any case. Thanks, Arnold
