Re: Full Unicode based on UTF-16 proposal

Norbert Lindenberg Sat, 17 Mar 2012 11:54:24 -0700

Steven, sorry, I wasn't aware of your proposal for /u when I inserted the note 
on this flag into my proposal. My proposal was inspired by the use of /u in 
PHP, where it switches from byte mode to UTF-8 mode. We'll have to see whether 
it makes sense to combine the two under one flag or use two - fortunately, 
Unicode still has a few other characters.


Norbert


On Mar 17, 2012, at 11:22 , Steven L. wrote:

> Eric Corry wrote:
>>> Disagree with adding /u for this purpose and disagree with breaking backward
>>> compatibility to let `/./.exec(s)[0].length == 2`.
>> 
>> Care to enlighten us with any thinking behind this disagreeing?
> 
> Sorry for the rushed and overly ebullient message. I disagreed with /u for 
> switching from code unit to code point mode because in the moment I didn't 
> think a code point mode necessary or particularly beneficial. Upon further 
> reflection, I rushed into this opinion and will be more closely examining the 
> related issues.
> 
> I further objected because I think the /u flag would be better used as a 
> ASCII/Unicode mode switcher for \d\w\b. My proposal for this is based on 
> Python's re.UNICODE or (?u) flag, which does the same thing except that it 
> also covers \s (which is already Unicode-based in ES). Therefore, I think 
> that if a flag is added that only switches from code unit to code point mode, 
> it should not be "u". Presumably, flag /u could simultaneously affect \d\w\b 
> and switch to code point mode. I haven't yet thought enough about combining 
> these two proposals to hold a strong opinion on the matter.
> 
>>> there are two ways to match any Unicode
>>> grapheme that match existing regex library precedent:
>>> 
>>> From Perl and PCRE:
>>> \X
>> 
>> This doesn't work inside [].  Were you envisioning the same restriction in 
>> JS?
>> 
>> Also it matches a grapheme cluster, which is may be useful but is
>> completely different to what the dot does.
> 
> You are of course correct. And yes, I was envisioning the same restriction 
> within character classes. But I'm not a strong proponent of \X, especially if 
> support for Unicode categories is added.
> 
>> I agree with Steven that these two cases should just be left alone,
>> which means they will continue to work the way they have until now.
> 
> Glad to hear it.
> 
>> You seem to be confusing graphemes and unicode code points.
>> [...]
>> The proposal you are responding to is all about adding Unicode code
>> point handling to regexps.  It is not about adding grapheme support,
>> which is a rather different issue.
> 
> Indeed. My response was rushed and poorly formed. My apologies.
> 
> --Steven Levithan
> 

_______________________________________________
es-discuss mailing list
[email protected]
https://mail.mozilla.org/listinfo/es-discuss

Re: Full Unicode based on UTF-16 proposal

Reply via email to