On Mon, Jan 28, 2013 at 7:24 AM, j. van den hoff <[email protected]>wrote:
> On Mon, 28 Jan 2013 12:15:22 +0100, Stephan Beal <[email protected]> > wrote: > > I'm quite sure that this is _not_ a standard regexp lib, but rather lua's > own (and somewhat different) substitute, called lua patterns, I believe. > the lua authors used to make a point of the fact, that the whole lua > implementation is smaller (in terms of LOC) than the usual regexp libs... > > the syntax for lua patterns is somewhat different from regexp syntax (but > quite nice/reasonable: eg. one or more white space chars would be "%s+", > "%w" is an alphanumeric char, etc.. there are also "captures" > (backreferences to matched subpatterns), so capabilities are good, > sometimes maybe superior to standard regexp). the one thing which used to > be missing (and quite probably still is) is logical alterations (OR > combination within patterns, such as "(apple|banana)"). > Indeed - I don't find '|' anywhere in the source code, so I think this capability is missing. I've only glanced at the lua-regexp.c, but it appears to be a direct matching of the regexp string against the input string, using backtracking. In this sense, it is similar to the GLOB operator implemented at http://www.fossil-scm.org/fossil/artifact/3d47a43dc9a?ln=155-217 but with a lot of additional logic to deal with substitutions and the expanded functionality of regular expressions. This is a clever design. All other regexp engines that I've seen (including the one in www.fossil-scm.org/fossil/artifact/c8fb75a1615f) compile the regexp into a state machine first, then run the state machine over the input string. Possible this is why the lua-regexp engine does not handle '|' - because doing so is difficult without a certain amount of preprocessing of the regular expression text. > > this is only to clarify the situation. otherwise I fully support this > proposal (since the lua patterns implementation is quite powerful and also > very light, much lighter than perl or standard regexp). > The regular expression matching in www.fossil-scm.org/fossil/artifact/c8fb75a1615f is also lightweight and it supports | and it is usually as fast or faster than grep in my tests (though there are some cases for which grep is faster). The regexp.c in fossil uses a NFA which gives worst case performance of O(NM) where N is the size of the input text to be matched and M is the size of the regular expression. Perl regular expressions and lua-regexp.c take exponential time for some (admittedly obscure) regular expressions. On the other hand, Perl regular expressions are more complete, and both Lua and Perl allow you to do substitutions, which the regexp.c file in Fossil does not. -- D. Richard Hipp [email protected]
_______________________________________________ fossil-users mailing list [email protected] http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users

