On Sat, Jan 31, 2004 at 02:07:07PM +0000, Markus Kuhn wrote:
> Question: What is a quick way in Perl to get a regular expression that
> matches all Unicode characters in the range U0100..U10FFFF, in other
> words all non-ASCII Unicode characters?
It looks like /[\x{100}-\x{10FFFF}]/ should do that, but it doesn't work
here.
perl -v
This is perl, v5.8.2 built for i386-linux-thread-multi
LANG=en_US.UTF-8
perl -ne 'if(/^(\x{61})$/) { print "$1\n"; }'
(in) a
(out) a
perl -ne 'if(/^(\x{fa})$/) { print "$1\n"; }'
(in) �
(nothing out)
perl -ne 'if(/^(.)$/) { print "$1\n"; }'
(in) a
(out) a
(in) �
grep '^.$'
(in) a
(out) a
(in) �
(out) �
perl -ne 'if(/^(..)$/) { print "$1\n"; }'
�
�
Why is "." matching a single byte in perl, instead of a single codepoint? Why
isn't \x{fa} working?
--
Glenn Maynard
--
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/linux-utf8/