Please find attached a character set conversion module.

It works like this:

use trans_charset;
my $t = trans_charset->new;
my $data = "äöüÄÖÜß and more funny chars";
my $output = $t->fromto(
        from => "mac", to => "latin1" );


Optionally you can supply a single replacement character or a
hash reference with lists of replacements characters

my $output = $t->fromto(
        from => "mac", to => "latin1",
        replacement => { chr( 222 ) =>'fi', chr( 176 ) => 'inf' } )

This example maps the fi-ligature and the infinity symbol only
available in MacRoman to something readable on other platforms.
The default is to replace every non representable character as
a single dot '.'.

Another options is avaible to translate the linebreak convention
        linebreak => { from => "\x0d", to => "\x0a" }


If you a Unix system you can do it all with "iconv",
If you have Perl 5.8 you could use the Encode module.

Like the Encode module my modules works with the .ucm
map files from the Unicode consortium. I get the Unicode code
point for each single byte and search this Unicode represantion
again in the map file of the target character set.

The attached .sit file contains the module trans_charset.pm
which you should be able to install yourself.
"trans_charset.t" is for testing and demonstration.
"mac2latin1-droplet.pl" is some perl code for building a MacPerl
droplet.
"mac2latin1-bbedit.pl" can be installed as a BBEdit Perl filter.
Please adapt those scripts to your own needs.

Possible character sets are MacRoman, ISO-8859-1, ISO-8859-15,
CP1250, CP850 and Adobe Standard Encoding.

Hope you find this useful.


Happy to hear your comments,
Axel

Attachment: %trans_charset.sit
Description: application/applefile

Attachment: trans_charset.sit
Description: Binary data

Reply via email to