Re: Questions regarding ES6 Unicode regular expressions

Claude Pache Tue, 26 Aug 2014 12:45:36 -0700

Le 26 août 2014 à 20:15, Mathias Bynens <[email protected]> a écrit :


> On 26 Aug 2014, at 19:01, Allen Wirfs-Brock <[email protected]> wrote:
> 
>> I've thought about this a bit. I was initially inclined to agree with the 
>> idea of extending the existing character classes similar to what Mathias' 
>> proposes.  But I now think that is probably not a very good idea and that 
>> what is currently spec'ed (essentially that the /u flag doesn't change the 
>> meaning of \w, \d, etc.) is the better path. […] It seems to me, that we 
>> want programmers to start migrating to full Unicode regular expressions 
>> without having to do major logic rewrite of their code.  For example, 
>> ideally the above expression could simply be replaced by 
>> `parseInt(/\s*(\d+)/u.exec(input)[1])` and everything in the application 
>> could continue to work unchanged.
> 
> I see your point, but I disagree with the notion that we must absolutely 
> maintain backwards compatibility in this case. The fact that the new flag is 
> opt-in gives us an opportunity to improve behavior without obsessing about 
> back-compat, similar to how the strict mode opt-in is used to make all sorts 
> of things better. When [evangelizing 
> `/u`](https://mathiasbynens.be/notes/es6-unicode-regex), we can educate 
> developers and tell them to not blindly/needlessly add `/u` to their existing 
> regular expressions.
> 
>> Instead, we should leave the definitions of \d, \w and \s unchanged and plan 
>> to adopt the already established convention that `\p{<Unicode property>}` is 
>> the notation for matching Unicode categories. See 
>> http://www.regular-expressions.info/unicode.html 
> 
> We could do both: improve `\d` and `\w` now, and add `\p{property}` and 
> `\P{property}` later. Anyhow, I’ve filed 
> https://bugs.ecmascript.org/show_bug.cgi?id=3157 for reserving 
> `\p{…}`/`\P{…}`.

The meaning of `\d` should not be changed; it is routinely used as a synonym of 
`[0-9]`. Changing its meaning is willfully introducing traps in the language, 
and it *will* produce bugs, for very little gain. It is much safer to learn to 
use `\pN` in the rare situations where one want to match numerical characters 
in any script.

For `\w` and `\b`, on the other hand, it can be corrected, because nobody would 
normally consider that there is two word boundaries in the middle of "fiancée", 
and it is not a useful semantics, especially in Unicode-aware contexts (that 
is, in situations where you should use the `u` flag).

—Claude


_______________________________________________
es-discuss mailing list
[email protected]
https://mail.mozilla.org/listinfo/es-discuss

Re: Questions regarding ES6 Unicode regular expressions

Reply via email to