On 7/16/02 2:37 PM, "Barry A. Warsaw" <[EMAIL PROTECTED]> wrote:
> CVR> the woodwork (I still run my Mailman box at home, however, so > CVR> I'm not going away. JC, quite jeering) > > Congratulations! I think. ;) Actually, yes. I won't be working 65+ hours a week any more, so I sort of get my life back, and may actually have time to think stuff through and do more than emergency patching... (for more, see <http://www.chuqui.com/cgi-bin/mwf/topic_show.pl?tid=348>). Also means I can actually start some non-Apple hacking again, I hope. And what I'll be doing is lots of fun, although the next six weeks is going to be a crunch. Still doing email, just off building a new custom system for stuff I can't talk about... > CVR> One thing we're definitely doing is moving to a cloaked > CVR> archive. Since we already distribute all archives out of > So these are public archives that need to be scrubbed, right? Until > now, Mailman has taken the approach that public archives are feed > right off the file system by the http server. We could still do that > if we scrubbed the messages before we archived them, although that > doesn't help with existing archives unless you re-generate them. Here's why I won't do that. I want to keep ONE set of archives. You can't scrub those archives for two reasons. What if someone writes looking to get in contact with the author of a message? If the archive is scrubbed, that info is gone. And (god forbid), you get into a legal tangle? That's your legal record of what was said on the mail list and who said it. If you scrub it, and someone does something actionable or libelous and you get a court order to provide that data? You're hosed. On a more likely note -- I can see where you might want the option to show the archives unscrubbed to validated users, and only scrub the public archives. As paranoid as I'm being today, I'd STILL like to find a way to let subscribed users see the archives unscrubbed. Which you could do by setting a cookie that the CGI could accept and change it's behavior. So I really like leaving the archives unmodified, and doing the scrubbing via CGI. It also allows you do to other things, like header cleanups (and you could potentially let a user set a cookie to define minimal or full headers, say...) and some quickie cleanup against unwrapped text and some other incidental archive glitches. I come from a newspaper family, so I have a bias towards "you don't unpublish stuff, you don't change it once it's published". But I think there are good reasons to avoid sanitizing the archives, and instead sanitizing the delivery of those archives -- if only because if your policies change, all you need to change is the CGI. And it gives you the ability to set up different sets of abilities per user or per list if you want, too. > So one question is: does the performance trade-off we made 5 years ago > still make sense? Should we just be vetting all archives through a > cgi, in which we can do fun stuff like cleanse it of email addresses? One of the big things I dislike about Mhonarc is that archives are a rather low-usage system, but maintaining the Mhonarc index pages is rather intensive use of system resources. Sort of like usenet -- you do a lot of work on everything, in case someone wants anything. I think simply storing the archives and sanitizing on demand is lower overhead. It also means pipermail won't need ANY changes -- you simply feed it out through the CGI instead of directly, and everything magically sanitizes... > We'd obviously have to get rid of the easy access to the raw mbox > file, so another question is whether that's still useful. Honestly? I don't think so. I find them real kludgy. I ended up doing a new archiving system (one file per message) via a perl script. We're about to take our new search engine out of beta with the thing, finally. > Also, what heuristic do you use to search for email addresses, and > what do you scrub them with? Still being worked on. Right now, I'm basically doing a <wordboundary><nonwhitespace>@<nonwhitespaceordot><dot>nonwhitespace><wordbo undary>. I don't know how strongly we'll refine it. >Do you want to attempt to obscure the > address (e.g. "barry--at--python--dot--org") Anything you programmatically obscure will be programmatically de-obscured. This technique is bogus and guaranteed to fail as soon as the spammers care enough. It's pretty clear even the "randomized obscuring" of slashdot is a failed technique, since spambots don't have to decode ALL of those formats, just some of them, and then cycle throug the site enough times.... Sorry, I find this is a false security. Makes the users feel better, accomplishes nothing useful, so in reality, users get lazy and careless. So to some degree, I feel it's worse than nothing. I'm planning on replacing email addresses with something useful like [email address deleted]. > CVR> disclosing that info. It creates other problems -- you can't > CVR> see a posting in the archive and send email to that person > CVR> with more questions (or answers), but that seems trivial > CVR> compared to the problems the spammers are causing. > > It kind of plays into Reply-To: munging doesn't it? If you won't be > able to reply to the original author, because we're anonymizing > messages, then you might as well munge Reply-To: to go back to the > list because that's the only posting address that makes sense. Yes (he says, grimacing). If you sanitize the archives, I don't think it affects the list. There are simply NO mailtos any more in the archives. If you go the step further and anonymize the postings ON the list, so subscriber email addresses simply are never shown to other subscribers under any circumstances (ugh. Urp. I can't believe I'm saying that. This is so anti-community it hurts), you have no choice and reply-to has to point to the list, since it's the only contact point left. If you instead turn the list server into a forwarding agent, as in: > Or should Mailman get into the anonymous resender game? There's > probably a lot we could do here, but given the political risks of > anonymous resenders, do we even want go there? Is it an anonymous remailer? We're making no pretense of anonymity here. We're acting as a forwarding agent, ala hotmail.com or mac.com. You mail to [EMAIL PROTECTED], and it ends up in my mailbox. The fact that we're not explicitly denoting the real email address doesn't make us an anonymous remailer -- that'd be a policy issue, actually. I suppose you could take it that step further, but you could also set it up so validated subscribers could get to the real addresses. The model I'm thinking of is like many forum systems. If you're a guest, you don't get access to email info. If you're a subscriber, you log on, and they magically appear. In the case of mailing lists, since oyu lose control of the e-mail address once it leaves the site again, you handle this by only using the remailer address in mail that leaves the site, but a subscriber could go to the list system and look a user up. That gets us away from the politics of the anonymous stuff. > CVR> A secondary issue here is the problem of disclosing admins > CVR> and admin addresses. > > Note that in MM2.1 we go about 1/2 way here. We include the obscured > email addresses of the list owners as the text in a mailto: tag but we > actually use the list-owner@ address as the mailto: target. That > might not be enough though. When we actually have a Real Database > backend we can keep a roster of email+realname and then just include > the realname inside the href:mailto tag. I think six months ago it was enough. Now, I just don't think it is. Sigh. Grumble. > Have you looked at SpamAssassin Chuq? See my other message. SA is a good tool, if you have someone around willing to update it, monitor it, and make sure it stays up to date technologically with current releases that are updated to match the spammers changes. Do you want to require SA to be installed as a requirement for Mailman? What about sites where they don't have an admin to keep updating it? SA is only as good as the latest release blocks spam. So you have to keep updating it. Is that a realistic (and ultimately successful) strategy? I HATE WHITELISTS. But in the case of public addresses, I'm now convinced they're needed, because otherwise, you're committing to an ever-escalating war to stay ahead of the spammers. At best, that's going to cost continuing manpower and energy and be zero sum. You won't win, you simply continue surviving by sticking thumbs in the dike. > Very few false positives too (usually it's > email amongst our postmasters talking about spam or SA ;). All it takes is one. Have you seen these stories? >>Some stuff I've run across while digging out from being on vacation... >> >>An interesting take on collaborative anti-spam issues -- that forging email headers to test/validate an open relay is an illegal trespass on a mail server: >> >><http://www.newarchitectmag.com/documents/s=2442/na0802g/index.html> >> >>Lincoln Stein saying the heck with it and deciding that manual filtering is better than the alternatives: >> >><http://www.newarchitectmag.com/documents/s=2445/na0802h/index.html> >> >>And in case you didn't see it, cNet's article on why the RBLs are creating false positive problems. It really looks like the blackhole systems have now hit a critical mass where they're being noticed, and not favorably. The folks at SPEWS, if you read what has happened through their stuff and how their attitude leaked all over their responses, hasn't helped their cause much. >> >><http://news.com.com/2100-1023-943337.html?tag=fd_lede> >> >>Finally, another article, this from TidBits, about the growing problem of BAD filtering and false positives, and how it creates another set of (probably even worse) problems..... >> >><http://db.tidbits.com/getbits.acgi?tbart=06866> Also: >> http://news.com.com/2100-1023-943337.html?tag=fd_lede > CVR> @#$@$#@$#23 caches (which are nice in some ways) for not also > CVR> creating some way to flag addresses as not > CVR> cacheable. Because, IMHO, that'd solve this problem. > > Yup, but of course it implies that the clients play by the rules, and > we know that they don't all, so the question is what we're willing to > give up for the security of our online personas. Kinda mirrors > today's large questions in the WoT(tm), eh? Maybe people are more > willing to give up their rights than their conveniences for some added > security. Yeah. I see your Sigh and raise you. > World domination of course. Because we /could/ add that stuff fairly > easily if we had the resources to expend on it. Would it still be > useable? For some audiences yes, others no. I'm fairly sure the > kind of anonymizing we're talking about would never fly in the Python > and Zope community, where as it's probably essential in a less > cloistered environment like lists.apple.com. Which leads me to > believe that we need to make it much easier to install themes or > styles of lists, from the paranoid anonymizer to the laissez-faire > discussion list. You have nailed it on the head. Which is why I brought it up. Not because this is the way it has to be in the future, but because all this is making Mailman's job a whole lot more complex (we were whining about that at work today, or at least I was and everyone was nodding sympathetically and looking for an open window -- email used to be pretty easy and straight forward. And now.....). But not just because all this crap is getting in the way, but also that fixing this crap is overkill for some environments, and going to be NOT ENOUGH in others. > CVR> Happy Macworld Expo week, all. If you need me, I'll be in the > CVR> war room, beating my head against a wall. > > Any chance you could make it down to DC for a side trip? We could > have a Mailman hacking sprint over a few dozen steamed Maryland blue > crabs and some cold ones. :) Damn, that sounds good, but -- I've had to give up crab and shellfish (I've developed an intermitten sensitivity to it. Sigh!) and I'm staying in cupertino where I'll be manning the war room this week making sure buttons get pushed when they need pushed, and not a minute before.... -- Chuq Von Rospach, Architech [EMAIL PROTECTED] -- http://www.chuqui.com/ No! No! Dead girl, OFF the table! -- Shrek _______________________________________________ Mailman-Developers mailing list [EMAIL PROTECTED] http://mail.python.org/mailman-21/listinfo/mailman-developers