Steven Levithan wrote:
* \s == [\x09-\x0D] -- Java, PCRE, Ruby, Python (default).
* \s == [\x09–\x0D\p{Z}] -- ES-current, .NET, Perl, Python (with (?u)).

Oops. My ASCII-only version of \s is obviously missing space \x20 and no-break space \xAO (which are included in Unicode's \p{Z}).

Erik Corry wrote:
Steven Levithan wrote:
[:alnum:] in Perl, PCRE, Ruby, Tcl, POSIX/GNU BRE/ERE, etc. matches only
[A-Za-z0-9]. Making it Unicode-based in ES would be confusing.

This would be pretty useless and is not true in perl. I tried the following:

perl -e "use utf8; print 'æ' =~ /[[:alnum:]]/ . \"\n\";"

and it prints 1, indicating a match.

***<Updating my mental notes>*** Roger that. Online docs (including the Perl-specific page you linked to earlier) typically list [:alnum:] as [A-Za-z0-9], but I've just done some quick testing and it seems that regex packages supporting [:alnum:] give it at least three different meanings:

* [A-Za-z0-9]
* [\p{Ll}\p{Lu}\p{Lt}\p{Nd}]
* [\p{Ll}\p{Lu}\p{Lt}\p{Nd}\p{Nl}]

Note that although Java doesn't support POSIX character class syntax, it too supports alnum via \p{Alnum}. Java's alnum matches only [A-Za-z0-9].

Anyway, this is probably all moot, unless someone wants to officially propose POSIX character classes for ES RegExp. ...In which case I'll be happy to state about a half-dozen reasons to not do so. :)

Erik Corry wrote:
OK, I'm convinced that /u should make \d, \b and \w Unicode aware.

w00t!

--Steven Levithan


_______________________________________________
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Reply via email to