There is no permanent solution to email spam (not yet anyway) and I doubt there will be one for weblogs, its an arms race ;) SA3 could go a
Weblog spam is completely different from e-mail spam. The objective of the e-mail spammer is for you to read their message and respond quickly. The opposite holds true of the weblog spammer. The spammer needs their comments to remain undetected (or at least undeleted) to boost and maintain the pagerank of the site that they are spamming for.
Practically speaking, it doesn't matter if a spammer is able to create a comment on someone's blog. So long as the comment is deleted or otherwise altered before the search engine indexes the page, they have gained nothing for their trouble. Care must be taken that legitimate links are not disabled: Bloggers like their pagerank and this reduces the effectiveness of search engines by prohibiting them from indexing pages that may only be mentioned in weblog comments.
A permanent solution to weblog spam would be one that requires the comment spammer to expend more resources to send the spam than they can gain through spamming while minimising the amount of resources that the weblog owner needs to expend. These resources include time, money and compute cycles.
Users need to feel that they can trust the solution. Two examples from the e-mail domain are MAPS RBL and IronPort BondedSender. In the case of MAPS RBL, Paul Vixie blocked journalists who published articles that he considered to be "pro spam" or "anti MAPS." In the case of BondedSender, some users incorrectly assume that IronPort is publishing a list of organisations who have paid them for the privilege of spamming.
The solution should not interfere with legitimate discussions. Delaying the posting of legitimate comments means that users can not easily engage in discussions.
To summarize, I think that a permanent solution to weblog spam needs to:
1. Eliminate the benefits of spamming (boost in pagerank). 2. Not eliminate the benefits of linking in legitimate comments. 3. Require minimal maintenance. 4. Be accountable and trustworthy to the user. 5. Not disrupt or delay legitimate comments.
I suspect that much of SpamAssassin's PMC is on vacation or otherwise occupied right now. Once people are back and I can get some +1s, I think that a good start for this project would be to use the wiki to critically analyse the various anti-blogspam offerings to identify their strengths and weaknesses.
I have managed to almost kill all my blog spam using SpamAssassin and Jay's BL rules which I converted to SA3. I don't have a good database yet but it is getting better. I also haven't tapped the half of SA's capability. I would love to test it on a heavy hit site though, to see how it scales.
It may be interesting to roll out a web-oriented version of SpamAssassin's rule set with the e-mail specific rules replaced by web-specific rules and the scores optimised accordingly. Something like this could also be useful for wikis and other open-submission web applications.
As a motivator to get this up and running quickly, we might consider putting together a paper submission for this year's CEAS conference (http://www.ceas.cc). The deadline is March 16th for an extended abstract or full paper.
Henry
signature.asc
Description: OpenPGP digital signature
