Re: Full Unicode based on UTF-16 proposal

Norbert Lindenberg Sat, 17 Mar 2012 12:38:19 -0700

On Mar 17, 2012, at 11:58 , Erik Corry wrote:

> 2012/3/17 Steven L. <[email protected]>:
>> I further objected because I think the /u flag would be better used as a
>> ASCII/Unicode mode switcher for \d\w\b. My proposal for this is based on
>> Python's re.UNICODE or (?u) flag, which does the same thing except that it
>> also covers \s (which is already Unicode-based in ES).
> 
> I am rather skeptical about treating \d like this.  I think "any digit
> including rods and roman characters but not decimal points/commas"
> http://en.wikipedia.org/wiki/Numerals_in_Unicode#Counting-rod_numerals
> would be needed much less often than the digits 0-9, so I think
> hijacking \d for this case is poor use of name space.  The \d escape
> in perl does not cover other Unicode numerals, and even with the
> [:name:] syntax there appears to be no way to get the Unicode
> numerals: 
> http://search.cpan.org/~flora/perl-5.14.2/pod/perlrecharclass.pod#POSIX_Character_Classes
> This suggests to me that it's not very useful.


Looking at that page, it seems \d gives you a reasonable set of digits, the 
ones in the Unicode general category Nd (number, decimal). These digits come 
from a variety of writing systems, but are all used decimal-positional, so you 
can parse at least integers using them with a fairly generic algorithm.

Dealing with roman numerals or counting rods requires specialized algorithms, 
so you probably don't want to find them in this bucket.

Norbert

_______________________________________________
es-discuss mailing list
[email protected]
https://mail.mozilla.org/listinfo/es-discuss

Re: Full Unicode based on UTF-16 proposal

Reply via email to