Re: Full Unicode based on UTF-16 proposal

Steven L. Sat, 17 Mar 2012 07:39:46 -0700

Eric Corry wrote:

However I think we probably do want the /u modifier on regexps to
control the new backward-incompatible behaviour.  There may be some
way to relax this for regexp literals in opted in Harmony code, but
for new RegExp(...) and for other string literals I think there are
rather too many inconsistencies with the old behaviour.

Disagree with adding /u for this purpose and disagree with breaking backwardcompatibility to let `/./.exec(s)[0].length == 2`. Instead, if this isdeemed an important enough issue, there are two ways to match any Unicodegrapheme that match existing regex library precedent:


From Perl and PCRE:

\X

From Perl, PCRE, .NET, Java, XML Schema, and ICU (among others):

\P{M}\p{M}*

Obviously \X is prettier, but because it's fairly rare for people to careabout this, IMO the more widely compatible solution that uses Unicodecategories is Good Enough if Unicode category syntax is on the table forES6.


Norbert Lindenberg wrote:

\uxxxx[\uyyyy-\uzzzz] is interpreted as [\uxxxx\uyyyy-\uxxxx\uzzzz]
[\uwwww-\uxxxx][\uyyyy-\uzzzz] is interpreted as[\uwwww\uyyyy-\uxxxx\uzzzz]This transformation is rather ugly, but I’m afraid it’s the priceECMAScript
has to pay for being 12 years late in supporting supplementary characters.

Yikes! -1! This is unnecessary if the handling of \uhhhh is unmodified andsupport for \u{h..} and/or \x{h..} is added (the latter is the syntax fromPerl and PCRE). Some people will want a way to match arbitrary Unicode codepoints rather than graphemes anyway, so leaving \uhhhh alone lets that usecase continue working. This would still allow modifying the handling ofliteral astral/supplementary characters in RegExps. If it can be handledsensibly, I'm all for treating literal characters in RegExps as discretegraphemes rather than splitting them into surrogate pairs.


--Steven Levithan

_______________________________________________
es-discuss mailing list
[email protected]
https://mail.mozilla.org/listinfo/es-discuss

Re: Full Unicode based on UTF-16 proposal

Reply via email to