On 26 Aug 2014, at 19:01, Allen Wirfs-Brock <[email protected]> wrote:
> I've thought about this a bit. I was initially inclined to agree with the > idea of extending the existing character classes similar to what Mathias' > proposes. But I now think that is probably not a very good idea and that > what is currently spec'ed (essentially that the /u flag doesn't change the > meaning of \w, \d, etc.) is the better path. […] It seems to me, that we want > programmers to start migrating to full Unicode regular expressions without > having to do major logic rewrite of their code. For example, ideally the > above expression could simply be replaced by > `parseInt(/\s*(\d+)/u.exec(input)[1])` and everything in the application > could continue to work unchanged. I see your point, but I disagree with the notion that we must absolutely maintain backwards compatibility in this case. The fact that the new flag is opt-in gives us an opportunity to improve behavior without obsessing about back-compat, similar to how the strict mode opt-in is used to make all sorts of things better. When [evangelizing `/u`](https://mathiasbynens.be/notes/es6-unicode-regex), we can educate developers and tell them to not blindly/needlessly add `/u` to their existing regular expressions. > Instead, we should leave the definitions of \d, \w and \s unchanged and plan > to adopt the already established convention that `\p{<Unicode property>}` is > the notation for matching Unicode categories. See > http://www.regular-expressions.info/unicode.html We could do both: improve `\d` and `\w` now, and add `\p{property}` and `\P{property}` later. Anyhow, I’ve filed https://bugs.ecmascript.org/show_bug.cgi?id=3157 for reserving `\p{…}`/`\P{…}`. > I think digesting all the \p{} possibilities is too much to do for ES6, so I > suggest that for ES6 that we simply reserve the \p{<characters>} and > \P{<characters>} syntax within /u patterns. A \p proposal can then be > developed for ES7. Sounds good to me. > I see one remaining issue: > In ES5 (and ES6): `/a-z/i` does not match U+017F (ſ) or U+212A (K) because > the ES canonicalization algorithm excludes mapping code points > 127 that > toUpperCase to code points <128. > However, as currently spec'ed, the ES6 canonicalization algorithm for /u > RegExps does not include that >127/<128 exclusion. It maps U+017F to "S" > which matches. > This is probably a minor variation, from the ES5 behavior, but we should > probably be sure it is a desirable and tolerable change as we presumably > could also apply the >127/<128 filter to /u canonicalization. This is a useful feature, and the explicit opt-in makes the small back-compat break acceptable IMHO. _______________________________________________ es-discuss mailing list [email protected] https://mail.mozilla.org/listinfo/es-discuss

