To finish up the Japanese support for Mailman, I'm going to dive in and start by adding support for MIME-encoding and decoding (either quoted-printable or base64, whichever is appropriate) header lines.
Right now, no matter what language is enabled, the localized emails sent out through the virgin queue are sent verbatim. We need to: 1) Encode the message (headers and body) with the encoding that locale uses for email. Needed for EUC-JP to iso-2022-jp. Can be used for Big5 and CN to iso-2022-cn (see http://www.imc.org/rfc1922) but I don't know if any Chinese mail readers actually support iso-2022-cn. 2) MIME-encode the Subject field, specifying the character set appropriate for the list's locale. Use quoted-printable for ASCII-like charsets, base64 for non-ASCII-like charsets. 3) Set the charset in the Content-Type: of the mail, again appropriate for the list's locale. 4) (sometimes) MIME-encode the body with base64, for 8-bit character sets. In a perfect world, all email would be 7-bit, but going by the Taiwanese spam I receive, people don't seem to send Chinese mail in iso-2022-cn. Instead, the common thing to do seems to be sending 8-bit Big5 mail that's base64 encoded (or not! I get 8-bit email directly in Big5 from time to time in my spam folder.) I looked at the email and mimify modules, and neither of them expose a proper interface for MIME-encoding headers. Well, mimify *tries*, really it does, but it forces quoted-printable, which makes no sense for Asian languages, and does line-wrapping incorrectly, a HUGELY important issue with double-byte encodings that will become corrupt if they're line-wrapped in between two bytes of a double-byte character. I think the proper place for this is in the email module, but I don't want to re-invent the wheel. (Though I do understand the issue very well, and have written up code to do this by-hand before, including supporting double-byte charsets properly). Can we get the code from somewhere else, or should I write up encode_header and decode_header methods for the email.Message class? Next, we need to come up with a table mapping languages to the encodings they use for email. Right now, these are the encodings used for our supported languages (from Defaults.py): def add_language(code, description, charset): LC_DESCRIPTIONS[code] = (description, charset) add_language('big5', _('Traditional Chinese'), 'big5') add_language('de', _('German'), 'iso-8859-1') add_language('en', _('English (USA)'), 'us-ascii') add_language('es', _('Spanish (Spain)'), 'iso-8859-1') add_language('fr', _('French'), 'iso-8859-1') add_language('gb', _('Simplified Chinese'), 'gb2312') add_language('hu', _('Hungarian'), 'iso-8859-1') add_language('it', _('Italian'), 'iso-8859-1') add_language('ja', _('Japanese'), 'euc-jp') add_language('no', _('Norwegian'), 'iso-8859-1') add_language('ru', _('Russian'), 'koi8-r') We need another mapping, from 'code' to 'email charset conversion', 'header mime method', and 'body mime method'. (The last one may not be necessary, if we are converting to a 7-bit encoding.) This is what I understand is actually supported by email clients people around the world use, but I could be very wrong. email_charsets = { # code mail conv header enc body enc 'big5': [None, 'base64', 'base64'], 'de': [None, 'qp', 'qp'], 'en': [None, None, None], 'es': [None, 'qp', 'qp'], 'fr': [None, 'qp', 'qp'], 'gb': [None, 'base64', 'base64'], # just a guess! use iso-2022-cn? 'hu': [None, 'qp', 'qp'], # I thought Hungarian was iso-8859-2? 'it': [None, 'qp', 'qp'], 'ja': ['iso-2022-jp', 'base64', None], 'no': [None, 'qp', 'qp'], 'ru': [None, 'base64', None], # I assume koi8-r is 7-bit.. } I was surprised to see that we specify iso-8859-1 for Hungarian; I'm pretty sure sure it uses accented vowels that are only in iso-8859-2. Also, I don't know if people actually use iso-2022-cn in the Real World. The RFCs suggest to use it, but I get the feeling it's not actually supported by Chinese email clients. Anyone know? If we can use it, then we should for both big5 and gb. In any case, there's a wonderful Python 2.0 codec module which I'm testing now, that makes it possible to convert to/from Japanese. I am VERY unhappy with the historically used kconv.py module, which has thrown tracebacks at me whenever it sees encodings it doesn't understand. It should be good for our purposes; we could just ship it in 'misc' for folks who need Japanese. http://pseudo.grad.sccs.chukyo-u.ac.jp/~kajiyama/python/ Ben -- Brought to you by the letters R and Y and the number 9. "Hoosh is a kind of soup." Debian GNU/Linux maintainer of Gimp and GTK+ -- http://www.debian.org/ _______________________________________________ Mailman-Developers mailing list [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/mailman-developers
