Nando wrote:
> OK, I get question number 2 now. My question was:
>
> 2) Is there some flaw in decode_header()? Something that Thunderbird
> displays as "Eduardo & Mônica" is being decoded with the wrong character
> in place of the ô:
> repr(decode_header(m["subject"])[0][0])
> 'Eduardo & M\xf4nica'
> The header being tested is:
> Subject: =?iso-8859-1?Q?Eduardo_&_M=F4nica?=
> In case we are again doing the Right Thing, then why does Thunderbird
> display it the way it was intended?
>
>
> The answer is I have to use codecs.decode():
>
> import codecs
>
> In [20]: [(s, encoding)] = decode_header("=?iso-8859-1?Q?P=F4nei?=")
>
> In [21]: s
> Out[21]: 'P\xf4nei'
>
> In [22]: encoding
> Out[22]: 'iso-8859-1'
>
> In [23]: print codecs.decode(s, encoding)
> Pônei
>
> Well, that just makes it even harder to use the return value of the
> decode_header() function. And instead of encapsulating all that
> complexity in the email library, you are forcing every user of the
> library to find all this out by himself, just as I had to.It gets harder, as you are not handling Unicode domain names. Code to convert email addresses between their ASCII and Unicode representations can be found at http://stuartbishop.net/Software/EmailAddress/ (Barry - we should discuss getting code to do this into the standard library again. I think I opened a bug on this soon after I wrote it - in 2004!) It is a bit of a learning curve, and I suspect that most users of the library have written the same or similar helpers, possibly several times. eg. the nearly mandatory header decoder: def decode_header(s): '''Decode an RFC2047 email header into a Unicode string.''' s = email.Header.decode_header(s) s = [b[0].decode(b[1] or 'ascii') for b in s] return u''.join(s) -- Stuart Bishop <[EMAIL PROTECTED]> http://www.stuartbishop.net/
signature.asc
Description: OpenPGP digital signature
_______________________________________________ Email-SIG mailing list [email protected] Your options: http://mail.python.org/mailman/options/email-sig/archive%40mail-archive.com
