2012/3/17 Steven L. <[email protected]>: > I further objected because I think the /u flag would be better used as a > ASCII/Unicode mode switcher for \d\w\b. My proposal for this is based on > Python's re.UNICODE or (?u) flag, which does the same thing except that it > also covers \s (which is already Unicode-based in ES).
I am rather skeptical about treating \d like this. I think "any digit including rods and roman characters but not decimal points/commas" http://en.wikipedia.org/wiki/Numerals_in_Unicode#Counting-rod_numerals would be needed much less often than the digits 0-9, so I think hijacking \d for this case is poor use of name space. The \d escape in perl does not cover other Unicode numerals, and even with the [:name:] syntax there appears to be no way to get the Unicode numerals: http://search.cpan.org/~flora/perl-5.14.2/pod/perlrecharclass.pod#POSIX_Character_Classes This suggests to me that it's not very useful. And instead of changing the meaning of \w, which will be confusing, I think that [:alnum:] as in perl would work fine. \b is a little tougher. The Unicode rewrite would be (?:(?<![:alnum:])(?=[:alnum:])|(?<=[:alnum:])(?![:alnum:])) which is obviously too verbose. But if we take \b for this then the ASCII version has to be written as (?:(?<!\w)(?=\w)|(?<=\w)(?!\w)) which is also more than a little annoying. However, often you don't need that if you have negative lookbehind because you can write something like /(?<!\w)word(?=!\w)/ // Negative look-behind for a \w and negative look-ahead for \w at the end. which isn't _too_ bad, even if it is much worse than /\bword\b/ > Indeed. My response was rushed and poorly formed. My apologies. Gratefully accepted with the hope that my next rushed and poorly formed response will also be forgiven! -- Erik Corry _______________________________________________ es-discuss mailing list [email protected] https://mail.mozilla.org/listinfo/es-discuss

