On Nov 13, 2007 12:18 PM, Gunnar Hjalmarsson <[EMAIL PROTECTED]> wrote: > Chas. Owens wrote: > > On Nov 13, 2007 10:58 AM, Chas. Owens <[EMAIL PROTECTED]> wrote: > > snip > >> I believe you want /\d[a-z]{2}/i. > > snip > > > > Oops, I didn't pay attention to my own warning about \d, I should have said > > > > /[0-9][a-z]{2}/i > > Are you saying that \d is no longer equivalent to [0-9]? If so, which > digits does \d match besides [0-9]? snip
Yep, that is what I am saying. The \d character class matches any numeric character, and that includes all of the numeric characters in Unicode. The following program outputs Wide character in print at t.pl line 8. Mongolian digit three is ᠓ it is a number (using \d) it is not a number (using [0-9]) it is not a number (using looks_like_number) I assume the reasoning for this is that regexes are text based, not datatype based. That is the string "\x{1811}\x{1812}\x{1813}" is a number ("123" in Mongolian) in text even if it isn't one Perl can do math with. <sarcasm>Frankly, I blame all the foreigners, they should just learn English and use ASCII</sarcasm> #!/usr/bin/perl use warnings; use strict; use Scalar::Util qw<looks_like_number>; my $three = "\x{1813}"; print "Mongolian digit three is $three\n"; if ($three =~ /\d/) { print "it is a number (using \\d)\n"; } else { print "it is not a number (using \\d)\n"; } if ($three =~ /[0-9]/) { print "it is a number (using [0-9])\n"; } else { print "it is not a number (using [0-9])\n"; } if (looks_like_number($three)) { print "it is a number (using looks_like_number)\n"; } else { print "it is not a number (using looks_like_number)\n"; }