Hello Ben, On Wed, Nov 21, 2001 at 06:29:41PM +0900, Ben Gertzfield wrote: > > Mikhail> The most serious bug I see here is that messages encoded > Mikhail> in base64 still get decorated with plaintext. > > Headers or bodies?
Oh, and headers hurt too, when someone replies with the mailing list label in the subject hidden inside a base64 encoded word, and Mailman slaps another label, ad infinitum. The subject context should be decoded prior to searching for the label. > Are you talking about the footer tacked on to the > end of messages? If so, it would be simple with the new message > structure to make the footer be a separate text part. Though, I > don't see how adding some plain text after the end of the boundary > could be corrupted; could you put an example corrupted message up? Cannot find one right now, but I see them every now and then on our Russian lists. Base64 is not a robust encoding; any non-base64 text appended to a base64 stream produces garbage when decoded. Decorating such messages with separate MIME part would be a better solution than fiddle with decoding/recoding. > Mikhail> Another problem is encoded messages in archives. Heck, > Mikhail> look at this list's archive to see what I'm talking > Mikhail> about. Those should also be decoded and have character > Mikhail> set converted to some uniform one. I'd suggest UTF-8, but > Mikhail> many browsers and text viewers still don't grok this > Mikhail> charset, so it'd better be selectable as well. > > I talked with Barry about this today. My solution is to "guess" the > character set based on whichever is most common in the archives, and > use that as the charset specified in the HTML. It's unreliable, can change over time, and will certainly cause problems. Leave the administrator control over which charset his list archives are served in. For storage, I'd still choose encoding everything into UTF-8; this makes archives independent of the target charset and resolves problems with multi-language messages. > For any messages with > multi-language subjects or bodies, the main language will be left > in the normal character set, and the multi-language parts will be > encoded with the UTF-8 HTML entity. For starters, this could be done for all non-ASCII symbols. > This will require Python unicode codecs for all our languages, which > do not exist for KOI-8, Big5, or GB, as far as I know. iconv-based codecs should exist for these; I must see. -- Stay tuned, MhZ JID: [EMAIL PROTECTED] ___________ The best audience is intelligent, well-educated and a little drunk. -- Maurice Baring _______________________________________________ Mailman-Developers mailing list [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/mailman-developers