Re: [Mailman-Users] What causes "decoding Unicode is not supported"?

Mark Sapiro Thu, 03 Sep 2009 08:38:43 -0700

Rosenbaum, Larry M. wrote:
>
>ornl71# python
>Python 2.5 (r25:51908, Sep 20 2006, 06:18:53)
>[GCC 3.4.6] on sunos5
>Type "help", "copyright", "credits" or "license" for more information.
>>>> import email
>>>> email.__version__
>'4.0.1'


I don't know if there was a different email 4.0.1 distributed with
Python 2.5 as opposed to Python 2.5.1, or if yours is modified by Sun
in some way (if it is a Sun package), but the problem is in your
email/message.py get_content_charset method.

All the email 4.0.x versions I have define this method as in the
attached message.get_content_charset.txt file.

In your case, the statement

    charset = unicode(charset, 'us-ascii').encode('us-ascii')

is attempting to convert charset to unicode without first testing if it
is already a unicode, which it is in the problem case.

It appears there may be an additional incompatibility between Mailman
2.1.12 and Python 2.5 as opposed to Python 2.5.x. I'll not this in the
FAQ.

If you can easily upgrade to a later Python 2.5.x, I think that will
solve the problem. If not, you could patch
/usr/local/lib/python2.5/email/message.py by replacing the definition
of get_content_charset with that in the attached file.

-- 
Mark Sapiro <m...@msapiro.net>        The highway is for gamblers,
San Francisco Bay Area, California    better use your sense - B. Dylan

    def get_content_charset(self, failobj=None):
        """Return the charset parameter of the Content-Type header.

        The returned string is always coerced to lower case.  If there is no
        Content-Type header, or if that header has no charset parameter,
        failobj is returned.
        """
        missing = object()
        charset = self.get_param('charset', missing)
        if charset is missing:
            return failobj
        if isinstance(charset, tuple):
            # RFC 2231 encoded, so decode it, and it better end up as ascii.
            pcharset = charset[0] or 'us-ascii'
            try:
                # LookupError will be raised if the charset isn't known to
                # Python.  UnicodeError will be raised if the encoded text
                # contains a character not in the charset.
                charset = unicode(charset[2], pcharset).encode('us-ascii')
            except (LookupError, UnicodeError):
                charset = charset[2]
        # charset character must be in us-ascii range
        try:
            if isinstance(charset, str):
                charset = unicode(charset, 'us-ascii')
            charset = charset.encode('us-ascii')
        except UnicodeError:
            return failobj
        # RFC 2046, $4.1.2 says charsets are not case sensitive
        return charset.lower()

------------------------------------------------------
Mailman-Users mailing list
Mailman-Users@python.org
http://mail.python.org/mailman/listinfo/mailman-users
Mailman FAQ: http://wiki.list.org/x/AgA3
Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/
Unsubscribe: 
http://mail.python.org/mailman/options/mailman-users/archive%40jab.org

Security Policy: http://wiki.list.org/x/QIA9

Re: [Mailman-Users] What causes "decoding Unicode is not supported"?

Reply via email to