Re: Full Unicode based on UTF-16 proposal

David Herman Sat, 24 Mar 2012 23:59:11 -0700

On Mar 24, 2012, at 11:23 PM, Norbert Lindenberg wrote:

> On Mar 24, 2012, at 12:21 , David Herman wrote:
> 
>> I'm still getting up to speed on Unicode and JS string semantics, so I'm 
>> guessing that I'm missing a reason why that wouldn't work... Presumably the 
>> JS source of the regexp literal, as a sequence of UTF-16 code units, 
>> represents the tetragram code points as surrogate pairs. Can we not 
>> recognize surrogate pairs in character classes within a /u regexp and 
>> interpret them as code points?
> 
> With /u, that's exactly what happens. My first proposal was to make this 
> happen even without a new flag, i.e., make
> "𝌆𝌇𝌈𝌉𝌊".match(/[𝌆-𝍖]+/)
> work based on code points, and Steve is arguing against that because of 
> compatibility risk. My proposal also includes some transformations to keep 
> existing regular expressions working, and Steve correctly observes that if we 
> have a flag for code point mode, then the transformation is not needed - old 
> regular expressions would continue to work in code unit mode, while new 
> regular expressions with /u get code point treatment.


Excellent!

Thanks,
Dave

_______________________________________________
es-discuss mailing list
[email protected]
https://mail.mozilla.org/listinfo/es-discuss

Re: Full Unicode based on UTF-16 proposal

Reply via email to