Re: [Email-SIG] Patch: Improve recognition of attachment file name, with encodings

Nando Sat, 16 Feb 2008 13:25:23 -0800

Looks like I forgot to attach the patch. Sorry. Here it is.


Nando Florestan
===============
[skype]    nandoflorestan
[phone]  + 55 (11) 3675-3038
[mobile] + 55 (11) 9820-5451
[internet] http://oui.com.br/
[À Capela] http://acapela.com.br/
[location] São Paulo - SP - Brasil



Nando wrote:

Decoding of RFC 2047 encoded filenames... I attach an updated patch. Nowit is off by default, but can be enabled by flipping a flag. I haveupdated the docstring for the get_filename() method. Let me know if I amforgetting something.
Two questions:
1) I have done this for the get_filename() method only. The flag thatneeds to be set is called *garbage_filename_decoding*. Look, it says"filename" in there. But are there any other parameters where theimproper usage of RFC 2047 also commonly occurs? If so, maybe a singleflag for all of them would be more appropriate...
2) Is there some flaw in decode_header()? Something that Thunderbirddisplays as "Eduardo & Mônica" is being decoded with the wrong characterin place of the ô:
repr(decode_header(m["subject"])[0][0])
'Eduardo & M\xf4nica'
The header being tested is:
Subject: =?iso-8859-1?Q?Eduardo_&_M=F4nica?=
In case we are again doing the Right Thing, then why does Thunderbirddisplay it the way it was intended?
I am not familiar with the RFCs. When I read Stephen Turnbull's messageexplaining that these are in fact malformed messages, I was veryworried. (I want the email library to just work...) Fortunately we cando the right thing by default, while still supporting decoding of themalformed messages.
I hope you can approve this small patch...

Index: message.py
===================================================================
--- message.py  (revision 60758)
+++ message.py  (working copy)
@@ -16,6 +16,7 @@
 import email.charset
 from email import utils
 from email import errors
+from email.header import decode_header
 
 SEMISPACE = '; '
 
@@ -103,6 +104,7 @@
         self._unixfrom = None
         self._payload = None
         self._charset = None
+        self.garbage_filename_decoding = False
         # Defaults for multipart messages
         self.preamble = self.epilogue = None
         self.defects = []
@@ -665,14 +667,27 @@
         The filename is extracted from the Content-Disposition header's
         `filename' parameter, and it is unquoted.  If that header is missing
         the `filename' parameter, this method falls back to looking for the
-        `name' parameter.
+        `name' parameter. Failing this, the Content-Type header is checked
+        for its `name' parameter.
+        
+        RFC 2231 determines the way in which parameters should be encoded.
+        It specifically forbids the use of RFC 2047 encodings in parameters.
+        (RFC 2047 deals with encoding of headers.) However, a few
+        mail agents do exactly the wrong thing. You can enable RFC 2047
+        decoding of filenames by flipping an instance flag:
+        
+            my_message.garbage_filename_decoding = True
         """
         missing = object()
         filename = self.get_param('filename', missing, 'content-disposition')
         if filename is missing:
             filename = self.get_param('name', missing, 'content-disposition')
         if filename is missing:
+            filename = self.get_param('name', missing, 'content-type')
+        if filename is missing:
             return failobj
+        if self.garbage_filename_decoding:
+            filename = decode_header(filename)[0][0]
         return utils.collapse_rfc2231_value(filename).strip()
 
     def get_boundary(self, failobj=None):

_______________________________________________
Email-SIG mailing list
[email protected]
Your options: 
http://mail.python.org/mailman/options/email-sig/archive%40mail-archive.com

Re: [Email-SIG] Patch: Improve recognition of attachment file name, with encodings

Reply via email to