On 05/16/2007 12:57 AM, Neil wrote:
Dear All:
Question:
How come the length of Chinese word I print shows “ 3 “.
Isn’t it supposed to 2 bytes?
Program:
-----------------------------------
$str=”我”;
$str_len = length($str);
Print $str_len, “\n\n”;
------------------------------------
The result is 3
I took a picture for the program. In case of it doesn’t show Chinese word
in some of your system,
[...]
My environment:
[...]
Encode: Big5
Something is messed up with your locale or environment. Since you only
have one character in $str, the length should be "1"--and that's what I get.
I saved your program two ways: as a utf8 file and as a big5 file; both
programs produce the same result on my system: 1; however, to get your
program to run, I had to change the quotes.
Here is the first program (saved in UTF8):
-----------------------------------
#!/usr/bin/perl
use utf8;
use strict;
use warnings;
my $str="我";
my $str_len = length($str);
print $str_len, "\n\n";
----------------------------------
Here is the second program (saved in Big5):
--------------------------------------------
#!/usr/bin/perl
use encoding big5 => STDOUT => 'utf8';
use strict;
use warnings;
my $str="§Ú";
my $str_len = length($str);
print $str_len, "\n\n";
print "data = $str\n";
--------------------------------------------
The second program displays this:
------start output-------
1
data = 我
-------end output--------
Evidently the Big5 character sequence \xA7\xDA represents the single
Unicode character \x6211 which is the Chinese character 我. You probably
just need to tell Perl about the encoding of your script.
My environment:
Perl 5.8.4
Debian 3.1
Encoding: UTF8
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/