Re: \w regular expressions unicode

Gunnar Hjalmarsson Wed, 22 Apr 2009 12:25:48 -0700

Stanisław T. Findeisen wrote:

Gunnar Hjalmarsson wrote:
What assumptions does Perl make regarding input file (i.e., theprogram/script file) encoding?
AFAIK, it just converts the bytes into Perl's internal format, but itdoes not assume anything (at least not by default) with respect to thecharacter encoding.
Is it so that string literals in Perl are byte arrays in fact?
String literals in a Perl script are byte *strings* until decoded.
Yeah, it looks so. With "use utf8" (http://perldoc.perl.org/utf8.html)one can however make them parsed (decoded) (provided they are valid UTF-8).

No. The utf8 pragma is about allowing UTF-8 encoded *symbols*, e.g.variable names or subroutine names.


$ perl -MEncode -le '
$s = "smÃ¶rgÃ¥sbord";
print length $s;
use utf8;
print length $s;
$s = decode "UTF-8", $s;
print length $s;
'
13
13
11
$

It's all about UTF8 flag:http://perldoc.perl.org/Encode.html#The-UTF8-flag .


Maybe... That's above my head right now, I'm afraid.

--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/

Re: \w regular expressions unicode

Reply via email to