Steven, sorry, I wasn't aware of your proposal for /u when I inserted the note on this flag into my proposal. My proposal was inspired by the use of /u in PHP, where it switches from byte mode to UTF-8 mode. We'll have to see whether it makes sense to combine the two under one flag or use two - fortunately, Unicode still has a few other characters.
Norbert On Mar 17, 2012, at 11:22 , Steven L. wrote: > Eric Corry wrote: >>> Disagree with adding /u for this purpose and disagree with breaking backward >>> compatibility to let `/./.exec(s)[0].length == 2`. >> >> Care to enlighten us with any thinking behind this disagreeing? > > Sorry for the rushed and overly ebullient message. I disagreed with /u for > switching from code unit to code point mode because in the moment I didn't > think a code point mode necessary or particularly beneficial. Upon further > reflection, I rushed into this opinion and will be more closely examining the > related issues. > > I further objected because I think the /u flag would be better used as a > ASCII/Unicode mode switcher for \d\w\b. My proposal for this is based on > Python's re.UNICODE or (?u) flag, which does the same thing except that it > also covers \s (which is already Unicode-based in ES). Therefore, I think > that if a flag is added that only switches from code unit to code point mode, > it should not be "u". Presumably, flag /u could simultaneously affect \d\w\b > and switch to code point mode. I haven't yet thought enough about combining > these two proposals to hold a strong opinion on the matter. > >>> there are two ways to match any Unicode >>> grapheme that match existing regex library precedent: >>> >>> From Perl and PCRE: >>> \X >> >> This doesn't work inside []. Were you envisioning the same restriction in >> JS? >> >> Also it matches a grapheme cluster, which is may be useful but is >> completely different to what the dot does. > > You are of course correct. And yes, I was envisioning the same restriction > within character classes. But I'm not a strong proponent of \X, especially if > support for Unicode categories is added. > >> I agree with Steven that these two cases should just be left alone, >> which means they will continue to work the way they have until now. > > Glad to hear it. > >> You seem to be confusing graphemes and unicode code points. >> [...] >> The proposal you are responding to is all about adding Unicode code >> point handling to regexps. It is not about adding grapheme support, >> which is a rather different issue. > > Indeed. My response was rushed and poorly formed. My apologies. > > --Steven Levithan > _______________________________________________ es-discuss mailing list [email protected] https://mail.mozilla.org/listinfo/es-discuss

