Stanisław T. Findeisen wrote:
Gunnar Hjalmarsson wrote:
Stanisław T. Findeisen wrote:
Hi how to write regular expressions matching against Unicode (eg.,
UTF-8) strings?
For instance, in my regexp:
qr/^([.<>@ \w])*$/
Decode the UTF-8 encoded strings before applying the regex on them.
$ perl -MEncode -le '
$utf8_encoded = "smörgåsbord";
$s = decode "UTF-8", $utf8_encoded;
print "Match" if $s =~ /^\w+$/;
'
Match
$
Thanks, decode helped with this. But can I ask you one more question?
What assumptions does Perl make regarding input file (i.e., the
program/script file) encoding?
AFAIK, it just converts the bytes into Perl's internal format, but it
does not assume anything (at least not by default) with respect to the
character encoding.
Is it so that string literals in Perl are byte arrays in fact?
String literals in a Perl script are byte *strings* until decoded.
What you type is what you get?
Not sure what you mean by that.
You may find http://perldoc.perl.org/perlunitut.html helpful.
--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl
--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/