Eric Corry wrote:
However I think we probably do want the /u modifier on regexps to
control the new backward-incompatible behaviour.  There may be some
way to relax this for regexp literals in opted in Harmony code, but
for new RegExp(...) and for other string literals I think there are
rather too many inconsistencies with the old behaviour.

Disagree with adding /u for this purpose and disagree with breaking backward compatibility to let `/./.exec(s)[0].length == 2`. Instead, if this is deemed an important enough issue, there are two ways to match any Unicode grapheme that match existing regex library precedent:

From Perl and PCRE:

\X

From Perl, PCRE, .NET, Java, XML Schema, and ICU (among others):

\P{M}\p{M}*

Obviously \X is prettier, but because it's fairly rare for people to care about this, IMO the more widely compatible solution that uses Unicode categories is Good Enough if Unicode category syntax is on the table for ES6.

Norbert Lindenberg wrote:
\uxxxx[\uyyyy-\uzzzz] is interpreted as [\uxxxx\uyyyy-\uxxxx\uzzzz]
[\uwwww-\uxxxx][\uyyyy-\uzzzz] is interpreted as [\uwwww\uyyyy-\uxxxx\uzzzz] This transformation is rather ugly, but I’m afraid it’s the price ECMAScript
has to pay for being 12 years late in supporting supplementary characters.

Yikes! -1! This is unnecessary if the handling of \uhhhh is unmodified and support for \u{h..} and/or \x{h..} is added (the latter is the syntax from Perl and PCRE). Some people will want a way to match arbitrary Unicode code points rather than graphemes anyway, so leaving \uhhhh alone lets that use case continue working. This would still allow modifying the handling of literal astral/supplementary characters in RegExps. If it can be handled sensibly, I'm all for treating literal characters in RegExps as discrete graphemes rather than splitting them into surrogate pairs.

--Steven Levithan

_______________________________________________
es-discuss mailing list
[email protected]
https://mail.mozilla.org/listinfo/es-discuss

Reply via email to