Chas. Owens wrote:
On Dec 26, 2007 2:59 PM, Gunnar Hjalmarsson <[EMAIL PROTECTED]> wrote:
Well, then you'll probably need to identify the utf8 octet sequences
that correspond to the special characters you want to see transformed.
snip

Perl strings are in UTF-8*, but if you want to specify a character
without using it directly (so the Perl file can still be treated as
ASCII) you use the UNICODE representation instead:

my $a_with_macron = "\x{0101}"; #UTF-8 encoding is C4 81

So, knowing the UTF-8 sequences is fairly useless.

This is the approach I had in mind:

$ cat test.pl
#!/usr/bin/perl
use Encode;

$octets = <DATA>;

$chars = decode 'utf8', $octets;

%special = ( "\xc3\x96" => 'O', "\xc3\xa5" => 'a' );
($translated = $octets) =~ s/(\xc3\x96|\xc3\xa5)/$special{$1}/g;

printf '%-28s%s', 'Raw data (utf8 encoded): ', $octets;
printf '%-28s%s', 'Readable characters: ', $chars;
printf '%-28s%s', 'Translated characters: ', $translated;

__DATA__
Östen Mogård

$ ./test.pl
Raw data (utf8 encoded):    Östen Mogård
Readable characters:        Östen Mogård
Translated characters:      Osten Mogard

However, I now realize that there ought to be smarter approaches...

--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl

--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/


Reply via email to