I have found a couple of Scrubber issues in 2.1.9 which I have patched
on the Release_2_1-maint branch.

These issues involved a message with a message/delivery-status part
which parses into text/plain sub-parts with None payloads, and a
message with an improperly RFC 2047 encoded filename which had a
trailing null byte.

In looking at how the Release_2_1-maint patches port to the Trunk, I've
seen some other problems.

I have developed the attached Scrubber.patch.txt patch for the Trunk
which I think fixes the problems I saw in 2.1.9 and fixes the other
problems I see.

I would appreciate others, particularly Tokio, looking at this to see
if it seems correct.

-- 
Mark Sapiro <[EMAIL PROTECTED]>       The highway is for gamblers,
San Francisco Bay Area, California    better use your sense - B. Dylan

Index: Scrubber.py
===================================================================
--- Scrubber.py (revision 8236)
+++ Scrubber.py (working copy)
@@ -325,7 +325,7 @@
             # of this type, this resulted in a text/plain sub-part with a
             # null body. See bug 1430236.
             except (binascii.Error, TypeError):
-                t = part.get_payload()
+                t = part.get_payload() or ''
             # Email problem was solved by Mark Sapiro. (TK)
             partcharset = part.get_content_charset('us-ascii')
             try:
@@ -334,7 +334,7 @@
                     AssertionError):
                 # What is the cause to come this exception now ?
                 # Replace funny characters.  We use errors='replace'.
-                u = unicode(t, 'ascii', 'replace')
+                t = unicode(t, 'ascii', 'replace')
             # Separation is useful
             if isinstance(t, basestring):
                 if not t.endswith('\n'):
@@ -344,12 +344,11 @@
                 charsets.append(partcharset)
         # Now join the text and set the payload
         sep = _('-------------- next part --------------\n')
-        # The i18n separator is in the list's charset. Coerce it to the
-        # message charset.
+        # The i18n separator is in the list's charset. Coerce to unicode.
         try:
-            s = unicode(sep, lcset, 'replace')
-            sep = s.encode(charset, 'replace')
+            sep = unicode(sep, lcset, 'replace')
         except (UnicodeError, LookupError, ValueError):
+            # This shouldn't occur.
             pass
         rept = sep.join(text)
         # Replace entire message with text and scrubbed notice.
@@ -360,7 +359,9 @@
             try:
                 replace_payload_by_text(msg, rept, charset)
                 break
-            except UnicodeError:
+            # Bogus charset can throw several exceptions
+            except (UnicodeError, LookupError, ValueError, TypeError,
+                    AssertionError):
                 pass
         if format:
             msg.set_param('format', format)
@@ -404,6 +405,8 @@
         ext = fnext or guess_extension(ctype, fnext)
     else:
         ext = guess_extension(ctype, fnext)
+    # Allow only alphanumerics, dash, underscore, and dot
+    ext = sre.sub('', ext)
     if not ext:
         # We don't know what it is, so assume it's just a shapeless
         # application/octet-stream, unless the Content-Type: is
_______________________________________________
Mailman-Developers mailing list
[email protected]
http://mail.python.org/mailman/listinfo/mailman-developers
Mailman FAQ: http://www.python.org/cgi-bin/faqw-mm.py
Searchable Archives: 
http://www.mail-archive.com/mailman-developers%40python.org/
Unsubscribe: 
http://mail.python.org/mailman/options/mailman-developers/archive%40jab.org

Security Policy: 
http://www.python.org/cgi-bin/faqw-mm.py?req=show&file=faq01.027.htp

Reply via email to