On 5/8/2010 1:05 PM, Lindsay Haisley wrote: > > The poster used an "Approved" pseudo-header. Mailman found the > pseudo-header in the text/plain part, removed it, and approved the post > for distribution. However in the text/html portion, the pseudo-header > was mucked up with markup and was apparently unrecognizable to Mailman. > It shows up in the message source as: > > <p style=3D"margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px Arial">Approved: = > =A0Hon94Bar</p> > > For rather obvious reasons, Mailman didn't find this rendition of the > pseudo-header, but because it found the Approved pseudo-header in the > text/plain portion, it distributed the message - with the administrator > password clearly displayed to the subscriber list for everyone with an > HTML-capable mail reader to see! Now this (very technically challenged) > customer has to change her list admin password and I have to work with > her to insure that this won't happen again. > > HTML-ized email is a real PITA, and we've had problems with the > pseudo-header before. It seems to me that if a submitted email has both > a text/plain and a text/html part, Mailman should look _first_ for the > pseudo-header in the text/html portion, and if it's not found there, the > post should be rejected at that point even if the pseudo-header is > clearly present in a text/plain part. These two sections are supposed to > be identical as far as content goes, or at least we can expect Mailman > to assume that they are. > > How can this be prevented? As far as I'm concerned, this is a bug.
It is a bug, <https://bugs.launchpad.net/mailman/+bug/266220>. My comments in the code say # MAS: Bug 1181161 - Now try all the text parts in case it's # multipart/alternative with the approved line in HTML or other # text part. We make a pattern from the Approved line and delete # it from all text/* parts in which we find it. It would be # better to just iterate forward, but email compatability for pre # Python 2.2 returns a list, not a true iterator. # # This will process all the multipart/alternative parts in the # message as well as all other text parts. We shouldn't find the # pattern outside the mp/a parts, but if we do, it is probably # best to delete it anyway as it does contain the password. # # Make a pattern to delete. We can't just delete a line because # line of HTML or other fancy text may include additional message # text. This pattern works with HTML. It may not work with rtf # or whatever else is possible. So the question is why does this fail in this case. The HTML part is clearly QP encoded, but we decode that and it decodes to <p style="margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px Arial">Approved: \xA0Hon94Bar</p> Where the \xA0 is the hex representation of the actual character which is a no-break space. The issue is that the pattern constructed in this case is 'Approved:(\s| )*Hon94Bar' and the re.sub(pattern, '', lines) (where lines is the message body) does not consider \xA0 to match \s. This is clearly a deficiency in the code, but there are two underlying issues: 1) the user double spaced between the Approved: and the password, and 2) the user's MUA encoded the two spaces as a space followed by a no-break space for the HTML part but it represented the no-break space as a raw character code instead of the HTML entity Had either of the above conditions not been true, the Approved: password would have been removed. I will modify the code to add \xA0 to make the pattern 'Approved:(\xA0|\s| )*Hon94Bar' in this case, which will work for this one and future ones like it, but I won't follow your suggestion to check the HTML first. I think this is unworkable without implementing an HTML rendering engine, and would likely be no different, at least in some cases, from just not checking for the pseudo-header in the message body at all. Note that we have never guaranteed removal of the pseudo-header from alternative parts, and if asked, I always recommend a true message header for this purpose. -- Mark Sapiro <m...@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan ------------------------------------------------------ Mailman-Users mailing list Mailman-Users@python.org http://mail.python.org/mailman/listinfo/mailman-users Mailman FAQ: http://wiki.list.org/x/AgA3 Security Policy: http://wiki.list.org/x/QIA9 Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/ Unsubscribe: http://mail.python.org/mailman/options/mailman-users/archive%40jab.org