On Mar 6, 2010, at 15:27, Bob Cronin wrote:
> Alrighty then, I guess we'll stick with the node-based default. Nobody has
> complained (much) about that for 7-8 years. We do provide a way they can
> override the default (either by spooling the punch with distcode cp:nnnn or
> adding a hokey optional RFC822-style comment to the subject line in the form
> "(cp:nnnn mmmm)", where nnnn is the source EBCDIC codepage and mmmm is the
> desired ASCII codepage). For a given EBCDIC codepage we choose what we
It would seem far less hokey to put in a conventional RFC 1521 header,
e.g. "Content-type: text/plain; charset=IBM-939" and let the next layer
handle the conversion of both the body and the header to ASCII
equivalents. The source character set could come from a
hierarchy:
specification by programmer
node-based
GLOBALV
> believe to the the best match, ASCII wise (e.g. if we get EBCDIC 939, we
> emit ASCII 943, aka x-sjis). We do the same in the other direction (i.e. we
> choose the EBCDIC codepage we believe matches the input ASCII character set
> the best). ...
Always be certain that the graphemes provided by ASCII character set
are a superset of those in the EBCDIC character set. As a last resort,
use UTF-8.
> We've got support for over 50 different ASCII and 50 different
> EBCDIC codepages (single and double byte). We derive the SBCS tables
> on-the-fly by converting from source to Unicode and then to target (although
> once we've derived a particular from/to table, we cache it so that we don't
> have to re-derive it until the next recycle). We don't cache DBCS tables
> (but we do have plugins for several of the more highly used ones to avoid
> the overhead of the trip to/from Unicode for every email).
I hate EBCDIC!
-- gil