>>>>> "CVR" == Chuq Von Rospach <[EMAIL PROTECTED]> writes:
CVR> First, a minor announcement. I'm no longer in charge of the CVR> mailing lists at apple, sort of. We've hired a person CVR> full-time, and he's been taking over the lists server as his CVR> full-time responsibility, allowing me to go off and work on CVR> other projects. I'm still in the loop, just not "it". I'm CVR> still going to be heavily involved as we move that box to CVR> Mailman 2.1, and after that, probably fade a bit more into CVR> the woodwork (I still run my Mailman box at home, however, so CVR> I'm not going away. JC, quite jeering) Congratulations! I think. ;) CVR> One thing we're definitely doing is moving to a cloaked CVR> archive. Since we already distribute all archives out of CVR> HTTP, not FTP, we're working on a CGI that'll strip all CVR> e-mail information out of messages on the fly (among other CVR> things, like header cleanup and some trivial formatting CVR> fixes). The idea is simple -- we've finally hit the point CVR> where you can't put an e-mail address up on a public site CVR> under any cirucmstance safely, so we're having to move to a CVR> system where we simply don't do that. So these are public archives that need to be scrubbed, right? Until now, Mailman has taken the approach that public archives are feed right off the file system by the http server. We could still do that if we scrubbed the messages before we archived them, although that doesn't help with existing archives unless you re-generate them. So one question is: does the performance trade-off we made 5 years ago still make sense? Should we just be vetting all archives through a cgi, in which we can do fun stuff like cleanse it of email addresses? We'd obviously have to get rid of the easy access to the raw mbox file, so another question is whether that's still useful. Occasionally it's damn handy if you're moving a list or gathering statistics on it, but on the other hand, it's a rich source of addresses to mine. Again, if we scrubbed the messages pre-archiving we likely be ok. Also, what heuristic do you use to search for email addresses, and what do you scrub them with? Do you want to attempt to obscure the address (e.g. "barry--at--python--dot--org") or replace it altogether (e.g. "[hidden email address]"), or maybe just replace it with a truncation (e.g. "[localpart's email address]"). CVR> I think the Mailman stuff needs to think about this, also. It CVR> impacts the archiving setup and other issues, but the CVR> harvesters have hit the point where we simply can't risk CVR> disclosing that info. It creates other problems -- you can't CVR> see a posting in the archive and send email to that person CVR> with more questions (or answers), but that seems trivial CVR> compared to the problems the spammers are causing. It kind of plays into Reply-To: munging doesn't it? If you won't be able to reply to the original author, because we're anonymizing messages, then you might as well munge Reply-To: to go back to the list because that's the only posting address that makes sense. And what if the original poster isn't a member of the list? Or should Mailman get into the anonymous resender game? There's probably a lot we could do here, but given the political risks of anonymous resenders, do we even want go there? CVR> A secondary issue here is the problem of disclosing admins CVR> and admin addresses. Note that in MM2.1 we go about 1/2 way here. We include the obscured email addresses of the list owners as the text in a mailto: tag but we actually use the list-owner@ address as the mailto: target. That might not be enough though. When we actually have a Real Database backend we can keep a roster of email+realname and then just include the realname inside the href:mailto tag. CVR> I know we've hashed that through once, but we've come to the CVR> (somewhat reluctant) decision to whitelist all public, CVR> non-personal email addresses. We're going to be implementing CVR> TMDA to do this, and will be switching all admin to generic CVR> addresses that filter through TMDA, as well as things like CVR> postmaster@ and the like. While I hate making users jump CVR> through hoops to get through to a real person (for those that CVR> don't know, TMDA is an overt whitelist. If you're not on the CVR> whitelist, you get mail back telling you to take some action, CVR> and until you do, the mail isn't delivered), but the abuse by CVR> the spammers on admin addresses is now so bad I'm declaring CVR> defeat and going to the whitelist. Have you looked at SpamAssassin Chuq? It's really done wonders to reduce the amount of spam actually getting through any python.org or zope.org address. I know 'cause I see the daily reports of quarantined messages. Very few false positives too (usually it's email amongst our postmasters talking about spam or SA ;). I feel a lot better about this approach than TMDA'ing essential addresses like postmaster or mailman-owner. CVR> I'm going to look and see if I can interface TMDA to the CVR> subscriber databases so that subscribers are by definition CVR> whitelisted, but we've hit the poiint where we have to do CVR> this. I'm not happy about it, but the war is lost, I think. Sigh. CVR> So what he did was open up his address book and send his CVR> message to everyone in it. And he's running one of these new CVR> e-mail clients that happily caches addresses it sees in case CVR> you want them again. So all of the addresses of people CVR> posting to the mailing lists he subscribed to were in his CVR> address book cache, so when he grabbed his address book, he CVR> grabbed all of those addresses, too. Wonderful. I think this has been presaged by Klez which does essentially the same thing w/o human intervention or such good intent. ;) CVR> But now we're wondering if we have to go to some sort of CVR> address cloaking ON lists, maybe some kind of address CVR> remapping through the server for replies, something. And I'm CVR> gritting my teeth at the developers who created those CVR> @#$@$#@$#23 caches (which are nice in some ways) for not also CVR> creating some way to flag addresses as not CVR> cacheable. Because, IMHO, that'd solve this problem. Yup, but of course it implies that the clients play by the rules, and we know that they don't all, so the question is what we're willing to give up for the security of our online personas. Kinda mirrors today's large questions in the WoT(tm), eh? Maybe people are more willing to give up their rights than their conveniences for some added security. CVR> Are we hitting a point where mail list servers have to act as CVR> blind front ends for all of the subscribers, where replies CVR> are processed by those servers, and the server then takes on CVR> the job of acting as a troll-exterminator and spam blocker? CVR> And what does that really mean for things like Mailman? World domination of course. Because we /could/ add that stuff fairly easily if we had the resources to expend on it. Would it still be useable? For some audiences yes, others no. I'm fairly sure the kind of anonymizing we're talking about would never fly in the Python and Zope community, where as it's probably essential in a less cloistered environment like lists.apple.com. Which leads me to believe that we need to make it much easier to install themes or styles of lists, from the paranoid anonymizer to the laissez-faire discussion list. CVR> Happy Macworld Expo week, all. If you need me, I'll be in the CVR> war room, beating my head against a wall. Any chance you could make it down to DC for a side trip? We could have a Mailman hacking sprint over a few dozen steamed Maryland blue crabs and some cold ones. :) -Barry _______________________________________________ Mailman-Developers mailing list [EMAIL PROTECTED] http://mail.python.org/mailman-21/listinfo/mailman-developers