On Wednesday 12 October 2005 04:31, Paul Querna wrote: > > An outline of what needs to be done can be found here: > > > > http://intertwingly.net/stories/2005/09/28/xchar.rb
Erm, no. We need to reencode from any incoming charset. We don't need to reinvent any wheels by recreating individual charset conversion tables. > Right now mod_mbox does *no* encoding translation. We really need to be > calling apr_xlate all over, and turning everything into UTF-8 First. > Currently, each item is encoded in whatever the client program sent it > as... which isn't good. Even the HTML is erroneously sent as iso-8859-1, so posts that arrive as utf-8 (eg from wrowe) display incorrectly! As of now it's not really fit for purpose. We should fix this for the benefit of all display formats, rather than address html, atom, or indeed anything else in isolation. Regarding the mail archives, the ideal solution would be to transcode incoming messages to a homogenous utf-8 before storing them. To make that useful, we'd need to transcode the existing archives too, though that would just be a one-off script. I see a mod_smtpd filter thrashing around that to-do list ... dammit, it's the long-awaited updates to charset_lite! The harder bit to deal with is _local_ encoding in a different charsets in header lines. That's a PITA, and is AFAIK peculiar to SMTP. -- Nick Kew