On Sat, 5 Jul 2003 17:00:14 +0200, Martin Buchmann wrote:

>i was wondering if something like the recode program exists in pure 
>perl, i.e. a module which allows you to convert different 8bit ASCII 
>character sets, e.g., DOS -> Mac. Often i have problems when people 
>are proivding 8bit ASCII files with an origin other than mac because 
>of the german umlauts, etc.

Well I have something: UTF8::Simple. If you look back to the old
archives, you may see it discussed on this list in the long past. I
never released it on CPAN and I'm not sure it's still too relevant these
days, with 5.8 having come out, but it's still very much in use at some
of our own sites. It still needs an update, there are some
incompatibility issues between Perl 5.005 (and earlier) and 5.6 and
later. The result is that now I have separate versions for these perl
versions, and I'd like to merge them back. Anyway, you're very welcome
to use it, just say the word.

It works like this: UTF8 is used as the intermediate format. One of the
reasons the module is called "UTF8::simple", is that it only handles
single byte character sets. I create closures that can convert a single
byte character string to UTF-8, or back, as stipulated by one of the
encoding tables that can be found at unicode.org's website,
 
        <http://www.unicode.org/Public/MAPPINGS/>

mostly under "VENDORS", like 


<http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/MAC/ROMAN.TXT>

for Mactext, and 


<http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP1252.TXT>

for Windows.

-- 
        Bart.

Reply via email to