Chas. Owens wrote:
On Dec 26, 2007 2:59 PM, Gunnar Hjalmarsson <[EMAIL PROTECTED]> wrote:
Well, then you'll probably need to identify the utf8 octet sequences
that correspond to the special characters you want to see transformed.
snip
Perl strings are in UTF-8*, but if you want to specify a character
without using it directly (so the Perl file can still be treated as
ASCII) you use the UNICODE representation instead:
my $a_with_macron = "\x{0101}"; #UTF-8 encoding is C4 81
So, knowing the UTF-8 sequences is fairly useless.
This is the approach I had in mind:
$ cat test.pl
#!/usr/bin/perl
use Encode;
$octets = <DATA>;
$chars = decode 'utf8', $octets;
%special = ( "\xc3\x96" => 'O', "\xc3\xa5" => 'a' );
($translated = $octets) =~ s/(\xc3\x96|\xc3\xa5)/$special{$1}/g;
printf '%-28s%s', 'Raw data (utf8 encoded): ', $octets;
printf '%-28s%s', 'Readable characters: ', $chars;
printf '%-28s%s', 'Translated characters: ', $translated;
__DATA__
Östen Mogård
$ ./test.pl
Raw data (utf8 encoded): Östen Mogård
Readable characters: Östen Mogård
Translated characters: Osten Mogard
However, I now realize that there ought to be smarter approaches...
--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/