Hello all,
Considering the latest press on blog comment spam, I think that it's time that we organize a cross-platform project to address the problem. There are a considerable number of plugins implemented for various blog software with the intent of reducing blog spam but many are ineffective or require a tremendous amount of work to maintain (Jay's mt-blacklist plugin is definitely the latter).
http://news.netcraft.com/archives/2004/12/17/hosts_disable_movable_type_as_comment_spam_slows_servers.html http://it.slashdot.org/article.pl?sid=04/12/18/1827225&tid=111&tid=128 http://www.sixapart.com/log/2004/12/more_on_comment.shtml
I propose that we create a subproject of Apache SpamAssassin to encourage collaborative research in the area of anti blog spam with the goal of producing cross-platform standards and implementations of workable comment spam solutions. SpamAssassin's expertise of anti-spam in the e-mail domain will complement the knowledge of the weblogging community.
Here are some of the ideas that I would like to explore further and see incorporated into standard installations of blogging software:
* Proof-of-work: A legitimate user will take several seconds to minutes to create each unqiue comment while a comment spammer sends them out as fast as possible. Consider a proof-of-work algorithm executed within the browser (e.g. javascript, java, activex) added to comment submission forms. The weblog software can safely reject all comment submissions that lack valid proof of work. Legitimate users will not be inconvenienced by a short delay as they submit their comment while spammers will not be able to easily submit comments in large volumes. For example, if a typical comment spammer sends 1000000 comments per day and the proof of work requires 2 seconds of compute time then they will need to dedicate 24 machines to proof-of-work computation to maintain their rate of transmission. The cons of this method are that users without advanced browsers or older, slow computers may not be able to post comments.
There is a javascript implementation of Hashcash that can be combined with SpamAssassin's hashcash verification and duplicate detection algorithms to quickly produce a prototype.
* Collaborative filtering: IronPort maintains a database of e-mail server traffic volumes called SenderBase. Mail servers can use SenderBase to find "traffic spikes" and potentially block e-mail from those servers. Something similar could be done for weblogs. As comments come in, weblogs could report the urls in the comments to a central server. If an URL is sent in too rapidly, it can be added to a list of probable spam urls and weblogs can quarantine or delete comments containing that url.
* DNS-based URI Blocklists: SpamAssassin has had great success using Jeff Chan's Spam URI Realtime Blocklists. When an e-mail arrives, SpamAssassin extracts the urls contained within and performs a few DNS TXT queries to find whether the url has been reported in spam. These blocklists can be used for weblogs too. Instead of Jay maintaining a central blocklist that people download and install manually, mt-blacklist could use a DNS-based blocklist that is effectively updated in real time. This would significantly cut down on comment spam because weblog owners would not need to actively maintain their blocklists. The submission process could be streamlined so that it doesn't consume so much of any one person's time.
I'm very interested to hear any comments that you may have on this idea and encourage you to pass this information on to your developer lists as well as to other weblog software developers that I have missed.
I look forward to collaborating with you in the future.
Best regards,
Henry Stern Committer, SpamAssassin
