I decided to try to understand that weird decode/encode incantation, and
learned about the "errors" keyword argument to the unicode function. So now I
have:
bodies = message.bodies(content_type='text/html')
allBodies = u"";
for body in bodies:
allBodies = allBodies + u"\n" + unicode(goodDecode(body[1]),
errors="ignore")
if not allBodies:
bodies = message.bodies(content_type='text/plain')
for body in bodies:
allBodies = allBodies + u"\n" + unicode(goodDecode(body[1]),
errors="ignore")
This gets rid of the exception.
This particular email message also gives me errors when I try to deal with the
attachments. Here's my code:
if hasattr(message, 'attachments'):
for a in message.attachments:
msg.attachmentNames.append(a[0])
msg.attachmentContents.append(db.Blob(goodDecode(a[1])))
msg.put()
And goodDecode is my workaround for the lowercasing bug in the mail API:
def goodDecode(encodedPayload):
encoding = encodedPayload.encoding
payload = encodedPayload.payload
if encoding and encoding.lower() != '7bit':
payload = payload.decode(encoding)
return payload
The first error I got was:
in goodDecode
encoding = encodedPayload.encoding
AttributeError: 'str' object has no attribute 'encoding'
OK, so apparently the message.attachments array might contain a str instead of
an EncodedPayload in the [1] member? That's not documented, AFAIK. So I added
this:
if not hasattr(encodedPayload, 'encoding'):
return encodedPayload
to the beginning of goodDecode(). That got me past the first error, and into
this one:
in receive
msg.attachmentNames.append(a[0])
TypeError: 'EncodedPayload' object is unindexable
It appears to me that the mail API is really screwing up these attachments. It
is putting an EncodedPayload where the name is supposed to be.
Uncle!
-Joshua
On Nov 16, 2009, at 4:39 PM, Joshua Smith wrote:
> Just went through a little hell robustifying my inbound mail handler, and
> though some of you would find this interesting.
>
> 1) Subject lines can contain line breaks. Who knew?
>
> It appears that subject lines can be word-wrapped in transit, and the inbound
> mail handler leaves them wrapped. I was parsing the subject with a RE, and,
> of course, my .'s didn't match the \n, which led to various problems.
>
> My fix: subject = re.sub('[\n\r]*', '', message.subject)
>
> 2) Decoded bodies are not necessarily valid unicode.
>
> I will not pretend that I have the slightest understanding of how strings
> work in Python. Coming from the land of java, I just find the whole thing a
> confusing mess.
>
> Anyway, what I learned is that if you have a db.TextProperty() and you expect
> to stuff the result of payload.decode() into it, it might not work. The
> error I got was:
>
> 'ascii' codec can't decode byte 0xbe in position 4228: ordinal not in
> range(128)
>
> After trying a dozen different incantations to try to sanitize the message so
> I could get *something* that perhaps a human could read, I finally found this
> one:
>
> msg.message = allBodies.decode('ascii','replace').encode('ascii',
> 'backslashreplace')
>
> I have no idea what that means, but it works. No exception was generated,
> and the message looks right to me.
>
> -Joshua
>
> --
>
> You received this message because you are subscribed to the Google Groups
> "Google App Engine" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to
> [email protected].
> For more options, visit this group at
> http://groups.google.com/group/google-appengine?hl=.
>
>
--
You received this message because you are subscribed to the Google Groups
"Google App Engine" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/google-appengine?hl=.