On Mar 24, 2012, at 11:23 PM, Norbert Lindenberg wrote: > On Mar 24, 2012, at 12:21 , David Herman wrote: > >> I'm still getting up to speed on Unicode and JS string semantics, so I'm >> guessing that I'm missing a reason why that wouldn't work... Presumably the >> JS source of the regexp literal, as a sequence of UTF-16 code units, >> represents the tetragram code points as surrogate pairs. Can we not >> recognize surrogate pairs in character classes within a /u regexp and >> interpret them as code points? > > With /u, that's exactly what happens. My first proposal was to make this > happen even without a new flag, i.e., make > "𝌆𝌇𝌈𝌉𝌊".match(/[𝌆-𝍖]+/) > work based on code points, and Steve is arguing against that because of > compatibility risk. My proposal also includes some transformations to keep > existing regular expressions working, and Steve correctly observes that if we > have a flag for code point mode, then the transformation is not needed - old > regular expressions would continue to work in code unit mode, while new > regular expressions with /u get code point treatment.
Excellent! Thanks, Dave _______________________________________________ es-discuss mailing list [email protected] https://mail.mozilla.org/listinfo/es-discuss

