RFC: New subproject, BlogSpamAssassin

Henry Stern 22 Dec 2004 19:44:37 -0000

Hello all,

Considering the latest press on blog comment spam, I think that it's
time that we organize a cross-platform project to address the problem.
There are a considerable number of plugins implemented for various blog
software with the intent of reducing blog spam but many are ineffective
or require a tremendous amount of work to maintain (Jay's mt-blacklist
plugin is definitely the latter).

http://news.netcraft.com/archives/2004/12/17/hosts_disable_movable_type_as_comment_spam_slows_servers.html
http://it.slashdot.org/article.pl?sid=04/12/18/1827225&tid=111&tid=128
http://www.sixapart.com/log/2004/12/more_on_comment.shtml

I propose that we create a subproject of Apache SpamAssassin to
encourage collaborative research in the area of anti blog spam with the
goal of producing cross-platform standards and implementations of
workable comment spam solutions.  SpamAssassin's expertise of anti-spam
in the e-mail domain will complement the knowledge of the weblogging
community.

Here are some of the ideas that I would like to explore further and see
incorporated into standard installations of blogging software:

* Proof-of-work:  A legitimate user will take several seconds to minutes
to create each unqiue comment while a comment spammer sends them out as
fast as possible.  Consider a proof-of-work algorithm executed within
the browser (e.g. javascript, java, activex) added to comment submission
forms.  The weblog software can safely reject all comment submissions
that lack valid proof of work.  Legitimate users will not be
inconvenienced by a short delay as they submit their comment while
spammers will not be able to easily submit comments in large volumes.
For example, if a typical comment spammer sends 1000000 comments per day
and the proof of work requires 2 seconds of compute time then they will
need to dedicate 24 machines to proof-of-work computation to maintain
their rate of transmission.  The cons of this method are that users
without advanced browsers or older, slow computers may not be able to
post comments.

There is a javascript implementation of Hashcash that can be combined
with SpamAssassin's hashcash verification and duplicate detection
algorithms to quickly produce a prototype.

* Collaborative filtering:  IronPort maintains a database of e-mail
server traffic volumes called SenderBase.  Mail servers can use
SenderBase to find "traffic spikes" and potentially block e-mail from
those servers.  Something similar could be done for weblogs.  As
comments come in, weblogs could report the urls in the comments to a
central server.  If an URL is sent in too rapidly, it can be added to a
list of probable spam urls and weblogs can quarantine or delete comments
containing that url.

* DNS-based URI Blocklists:  SpamAssassin has had great success using
Jeff Chan's Spam URI Realtime Blocklists.  When an e-mail arrives,
SpamAssassin extracts the urls contained within and performs a few DNS
TXT queries to find whether the url has been reported in spam.  These
blocklists can be used for weblogs too.  Instead of Jay maintaining a
central blocklist that people download and install manually,
mt-blacklist could use a DNS-based blocklist that is effectively updated
in real time.  This would significantly cut down on comment spam because
weblog owners would not need to actively maintain their blocklists.  The
submission process could be streamlined so that it doesn't consume so
much of any one person's time.

I'm very interested to hear any comments that you may have on this idea
and encourage you to pass this information on to your developer lists as
well as to other weblog software developers that I have missed.

I look forward to collaborating with you in the future.

Best regards,

Henry Stern
Committer, SpamAssassin

RFC: New subproject, BlogSpamAssassin

Reply via email to