On Nov 24, 2012, at 10:21, Bob Cronin wrote: > Utf8 isn't a character set per se it is an algorithm for encoding unicode > characters using mail-safe 8-bit quantities. A fine point but important to > realize nonetheless. So to deal with Utf8 on a mainframe you have to be > able to undo the 8bit encoding algorithm to yield ascii unicode and then > use the appropriate ascii to ebcdic translation table to convert the ascii > unicode to the target ebcdic character set. ... > Isn't "ascii unicode" somewhat oxymoronic? There isn't an "ASCII Unicode", nor an "EBCDIC Unicode", nor a "Baudot Unicode", ... There's one Unicode (that's what the "Uni-" means), but, yes, numerous representations for transmission and compaction, one of which is UTF-8. (And part of the code space is reserved for local purposes (Klingon?). I suspect that's not a concern for the OP.)
Some filters, such as iconv(1) will convert UTF-8 to an EBCDIC code page with a single command. But their hidden internal operation may be as you suggest. "[M]ail-safe 8-bit" is a bit of wishful thinking. Almost any MUA I use, on encountering a character value >127 will cautiously further encode as either quoted-printable or base64. This happens somewhere along the route even when the proximate MTA agrees in the handshaking to use 8-bit. -- gil
