On Sat, 5 Jul 2003 17:00:14 +0200, Martin Buchmann wrote: >i was wondering if something like the recode program exists in pure >perl, i.e. a module which allows you to convert different 8bit ASCII >character sets, e.g., DOS -> Mac. Often i have problems when people >are proivding 8bit ASCII files with an origin other than mac because >of the german umlauts, etc.
Well I have something: UTF8::Simple. If you look back to the old archives, you may see it discussed on this list in the long past. I never released it on CPAN and I'm not sure it's still too relevant these days, with 5.8 having come out, but it's still very much in use at some of our own sites. It still needs an update, there are some incompatibility issues between Perl 5.005 (and earlier) and 5.6 and later. The result is that now I have separate versions for these perl versions, and I'd like to merge them back. Anyway, you're very welcome to use it, just say the word. It works like this: UTF8 is used as the intermediate format. One of the reasons the module is called "UTF8::simple", is that it only handles single byte character sets. I create closures that can convert a single byte character string to UTF-8, or back, as stipulated by one of the encoding tables that can be found at unicode.org's website, <http://www.unicode.org/Public/MAPPINGS/> mostly under "VENDORS", like <http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/MAC/ROMAN.TXT> for Mactext, and <http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP1252.TXT> for Windows. -- Bart.