Re: Full Unicode based on UTF-16 proposal

Erik Corry Sat, 17 Mar 2012 11:58:40 -0700

2012/3/17 Steven L. <[email protected]>:
> I further objected because I think the /u flag would be better used as a
> ASCII/Unicode mode switcher for \d\w\b. My proposal for this is based on
> Python's re.UNICODE or (?u) flag, which does the same thing except that it
> also covers \s (which is already Unicode-based in ES).


I am rather skeptical about treating \d like this.  I think "any digit
including rods and roman characters but not decimal points/commas"
http://en.wikipedia.org/wiki/Numerals_in_Unicode#Counting-rod_numerals
would be needed much less often than the digits 0-9, so I think
hijacking \d for this case is poor use of name space.  The \d escape
in perl does not cover other Unicode numerals, and even with the
[:name:] syntax there appears to be no way to get the Unicode
numerals: 
http://search.cpan.org/~flora/perl-5.14.2/pod/perlrecharclass.pod#POSIX_Character_Classes
 This suggests to me that it's not very useful.

And instead of changing the meaning of \w, which will be confusing, I
think that [:alnum:] as in perl would work fine.

\b is a little tougher.  The Unicode rewrite would be
(?:(?<![:alnum:])(?=[:alnum:])|(?<=[:alnum:])(?![:alnum:])) which is
obviously too verbose.  But if we take \b for this then the ASCII
version has to be written as
(?:(?<!\w)(?=\w)|(?<=\w)(?!\w)) which is also more than a little
annoying.  However, often you don't need that if you have negative
lookbehind because you can write something
like

/(?<!\w)word(?=!\w)/    // Negative look-behind for a \w and negative
look-ahead for \w at the end.

which isn't _too_ bad, even if it is much worse than

/\bword\b/

> Indeed. My response was rushed and poorly formed. My apologies.

Gratefully accepted with the hope that my next rushed and poorly
formed response will also be forgiven!

-- 
Erik Corry
_______________________________________________
es-discuss mailing list
[email protected]
https://mail.mozilla.org/listinfo/es-discuss

Re: Full Unicode based on UTF-16 proposal

Reply via email to