I decided to try to understand that weird decode/encode incantation, and 
learned about the "errors" keyword argument to the unicode function.  So now I 
have:

    bodies = message.bodies(content_type='text/html')
    allBodies = u"";
    for body in bodies:
      allBodies = allBodies + u"\n" + unicode(goodDecode(body[1]), 
errors="ignore")
    if not allBodies:
      bodies = message.bodies(content_type='text/plain')
      for body in bodies:
        allBodies = allBodies + u"\n" + unicode(goodDecode(body[1]), 
errors="ignore")

This gets rid of the exception.

This particular email message also gives me errors when I try to deal with the 
attachments.  Here's my code:

      if hasattr(message, 'attachments'):
        for a in message.attachments:
          msg.attachmentNames.append(a[0])
          msg.attachmentContents.append(db.Blob(goodDecode(a[1])))
        msg.put()

And goodDecode is my workaround for the lowercasing bug in the mail API:

def goodDecode(encodedPayload):
  encoding = encodedPayload.encoding
  payload = encodedPayload.payload
  if encoding and encoding.lower() != '7bit':
    payload = payload.decode(encoding)
  return payload

The first error I got was:

in goodDecode
    encoding = encodedPayload.encoding
AttributeError: 'str' object has no attribute 'encoding'

OK, so apparently the message.attachments array might contain a str instead of 
an EncodedPayload in the [1] member?  That's not documented, AFAIK.  So I added 
this:

  if not hasattr(encodedPayload, 'encoding'):
    return encodedPayload

to the beginning of goodDecode().  That got me past the first error, and into 
this one:

in receive
    msg.attachmentNames.append(a[0])
TypeError: 'EncodedPayload' object is unindexable

It appears to me that the mail API is really screwing up these attachments.  It 
is putting an EncodedPayload where the name is supposed to be.

Uncle!

-Joshua

On Nov 16, 2009, at 4:39 PM, Joshua Smith wrote:

> Just went through a little hell robustifying my inbound mail handler, and 
> though some of you would find this interesting.
> 
> 1) Subject lines can contain line breaks.  Who knew?
> 
> It appears that subject lines can be word-wrapped in transit, and the inbound 
> mail handler leaves them wrapped.  I was parsing the subject with a RE, and, 
> of course, my .'s didn't match the \n, which led to various problems.
> 
> My fix:     subject = re.sub('[\n\r]*', '', message.subject)
> 
> 2) Decoded bodies are not necessarily valid unicode.
> 
> I will not pretend that I have the slightest understanding of how strings 
> work in Python.  Coming from the land of java, I just find the whole thing a 
> confusing mess.
> 
> Anyway, what I learned is that if you have a db.TextProperty() and you expect 
> to stuff the result of payload.decode() into it, it might not work.  The 
> error I got was:
> 
>    'ascii' codec can't decode byte 0xbe in position 4228: ordinal not in 
> range(128)
> 
> After trying a dozen different incantations to try to sanitize the message so 
> I could get *something* that perhaps a human could read, I finally found this 
> one:
> 
>    msg.message = allBodies.decode('ascii','replace').encode('ascii', 
> 'backslashreplace')
> 
> I have no idea what that means, but it works.  No exception was generated, 
> and the message looks right to me.
> 
> -Joshua
> 
> --
> 
> You received this message because you are subscribed to the Google Groups 
> "Google App Engine" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to 
> [email protected].
> For more options, visit this group at 
> http://groups.google.com/group/google-appengine?hl=.
> 
> 

--

You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=.


Reply via email to