Eric Corry wrote:
However I think we probably do want the /u modifier on regexps to
control the new backward-incompatible behaviour. There may be some
way to relax this for regexp literals in opted in Harmony code, but
for new RegExp(...) and for other string literals I think there are
rather too many inconsistencies with the old behaviour.
Disagree with adding /u for this purpose and disagree with breaking backward
compatibility to let `/./.exec(s)[0].length == 2`. Instead, if this is
deemed an important enough issue, there are two ways to match any Unicode
grapheme that match existing regex library precedent:
From Perl and PCRE:
\X
From Perl, PCRE, .NET, Java, XML Schema, and ICU (among others):
\P{M}\p{M}*
Obviously \X is prettier, but because it's fairly rare for people to care
about this, IMO the more widely compatible solution that uses Unicode
categories is Good Enough if Unicode category syntax is on the table for
ES6.
Norbert Lindenberg wrote:
\uxxxx[\uyyyy-\uzzzz] is interpreted as [\uxxxx\uyyyy-\uxxxx\uzzzz]
[\uwwww-\uxxxx][\uyyyy-\uzzzz] is interpreted as
[\uwwww\uyyyy-\uxxxx\uzzzz]
This transformation is rather ugly, but I’m afraid it’s the price
ECMAScript
has to pay for being 12 years late in supporting supplementary characters.
Yikes! -1! This is unnecessary if the handling of \uhhhh is unmodified and
support for \u{h..} and/or \x{h..} is added (the latter is the syntax from
Perl and PCRE). Some people will want a way to match arbitrary Unicode code
points rather than graphemes anyway, so leaving \uhhhh alone lets that use
case continue working. This would still allow modifying the handling of
literal astral/supplementary characters in RegExps. If it can be handled
sensibly, I'm all for treating literal characters in RegExps as discrete
graphemes rather than splitting them into surrogate pairs.
--Steven Levithan
_______________________________________________
es-discuss mailing list
[email protected]
https://mail.mozilla.org/listinfo/es-discuss