On Sat, Jan 31, 2004 at 02:07:07PM +0000, Markus Kuhn wrote:
> Question: What is a quick way in Perl to get a regular expression that
> matches all Unicode characters in the range U0100..U10FFFF, in other
> words all non-ASCII Unicode characters?

It looks like /[\x{100}-\x{10FFFF}]/ should do that, but it doesn't work
here.

perl -v
This is perl, v5.8.2 built for i386-linux-thread-multi
LANG=en_US.UTF-8

perl -ne 'if(/^(\x{61})$/) { print "$1\n"; }'
(in) a
(out) a

perl -ne 'if(/^(\x{fa})$/) { print "$1\n"; }'
(in) �
(nothing out)

perl -ne 'if(/^(.)$/) { print "$1\n"; }'
(in) a
(out) a
(in) �

grep '^.$'
(in) a
(out) a
(in) �
(out) �

perl -ne 'if(/^(..)$/) { print "$1\n"; }'
�
�

Why is "." matching a single byte in perl, instead of a single codepoint? Why
isn't \x{fa} working?

-- 
Glenn Maynard

--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Reply via email to