Hi,
There is a character map repository at http://std.dkuug.dk/i18n/charmaps/
You might find it useful. The maps show both the original encoding, and the
corresponding Unicode codepoint. They shouldn't be to hard to parse.
Once you you have the Unicode codepoint corresponding to the character, you
can pack it into a UTF-8 character using:
$UTF8Char = pack("U*",hex($strCodePoint));
/Henrik
> -----Original Message-----
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED]]On Behalf Of
> [EMAIL PROTECTED]
> Sent: Friday, May 03, 2002 6:55 PM
> To: [EMAIL PROTECTED]
> Subject: Guess native character encodings for conversion to UTF-8 in
> XML?
>
>
> Hi all,
> I have giant batches of invoices from around the world. I'm trying to
> stick their native text into a XML document as UTF-8. The thing is, the
> text I'm seeing *almost* like us-ascii. Some things like funny characters
> just don't quite pass muster. So do you know of any standard way
> to coerce
> us-ascii-ish-but-not-quite data into UTF8 where it should be able to live
> happily. I do get clues to the country involved if that helps.
>
> Josh
> _______________________________________________
> ActivePerl mailing list
> [EMAIL PROTECTED]
> To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
>
_______________________________________________
ActivePerl mailing list
[EMAIL PROTECTED]
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs