C-style syntax is hard to parse in general, but regex literals can be particularly tricky. However, many kinds of tools (syntax highlighters, minifiers, etc.) need to parse them accurately, but unfortunately most such ECMAScript-based tools don't (I could start naming high-profile tools with edge-case regex-syntax parsing bugs, but it should be obvious that it is not entirely trivial). ES4 regex proposals make this even harder in several ways, but worst of all (from a regex syntax parser complexity perspective) is the java.util.regex-inspired infinitely-nesting character class subtraction and intersection syntax.
Now, I understand that the feature is powerful (and I assume also quite useful in the case of regexes which make heavy use of ES4's Unicode property tokens), but it effectively makes it impossible to parse ES4 regex syntax using ES4 regexes (which lack PCRE/.NET/Perl's recursion support). And considering that java.util.regex is the only (major) regex library to include full character class set operations (.NET only does class subtraction), I don't think people would miss the feature that greatly. Of course, mixing recursion support into existing regex syntax parsers is probably not really all that difficult in most cases, but nevertheless, I'm interested in what others think about the character class subtraction and intersection features. Personally, I think only allowing one level of character class nesting might be a reasonable compromise, especially since people could emulate more levels of nesting using lookahead anyway. _______________________________________________ Es4-discuss mailing list [email protected] https://mail.mozilla.org/listinfo/es4-discuss
