On Wed, Jan 28, 2015 at 11:36 AM, Marja Hölttä <ma...@chromium.org> wrote:
> The ES6 unicode regexp spec is not very clear regarding what should happen > if the regexp or the matched string contains lonely surrogates (a lead > surrogate without a trail, or a trail without a lead). For example, for the > . operator, the relevant parts of the spec speak about characters: > Just a bit of terminology. The term "character" is overloaded, so Unicode provides the unambiguous term "code point". For example, U+0378 is not (currently) an encoded character according to Unicode, but it would certainly be a terrible idea to disregard it, or not match it. It is a reserved code point that may be assigned as an encoded character in the future. So both U+D83D and U+0378 are not characters. If a ES spec uses the term "character" instead of "code point", then at some point in the text it needs to disambiguate what is meant. As to how this should be handled in regex expressions, I'd suggest looking at Java's approach. Mark <https://google.com/+MarkDavis> *— Il meglio è l’inimico del bene —*
_______________________________________________ es-discuss mailing list es-discuss@mozilla.org https://mail.mozilla.org/listinfo/es-discuss