On Wed, 06 Mar 2002 11:01:51 -0800 "James J. Besemer" <[EMAIL PROTECTED]> wrote: > >[EMAIL PROTECTED] wrote: >> You'd think! I've had a couple of patches contributed that filter out >> HTML, but I've not been able to whip them into shape for inclusion. >> I've basically given up hope for MM2.1, but will look at it again for >> the next release. The problem is that the naive approach isn't >> difficult, but for it to be robust is much more difficult. > >When you find more time I'd appreciate some more background on this. > >Wanting to filter out HTML (nb. from AOL accounts) is the #1 gripe from my >users. > >The Python library has an HTML parser that I've used before and it works >pretty well. I used it to translate HTML to HTML, inserting data in >various named fields. But removal of the HTML is the default action of >the code. Of course you don't really want simply to remove it. E.g., >you'd want to include HREF's somehow, substitute the description for >images, etc.
Most of the time you really can just strip out the HTML. AOL, Outhouse, and most of the other clients that like to generate HTML put out multipart/alternative messages that include a text/plain section, so picking out the latter and dropping the other alternatives works pretty well. Almost all of the pure-HTML traffic I see is spam. I've been using one of the patches Barry referred to on some medium-sized lists for the past 1.5 years with no complaints and very few instances of message bodies disappearing entirely. (It was the release of AOL 6.0, which doesn't allow turning off HTML, that prompted me.) -les _______________________________________________ Mailman-Developers mailing list [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/mailman-developers
