Re: RFC: New subproject, BlogSpamAssassin

Henry Stern 23 Dec 2004 01:13:06 -0000

There is no permanent solution to email spam (not yet anyway) and I
doubt there will be one for weblogs, its an arms race ;) SA3 could go a


Weblog spam is completely different from e-mail spam.  The objective of
the e-mail spammer is for you to read their message and respond quickly.
The opposite holds true of the weblog spammer.  The spammer needs their
comments to remain undetected (or at least undeleted) to boost and
maintain the pagerank of the site that they are spamming for.

Practically speaking, it doesn't matter if a spammer is able to create a
comment on someone's blog.  So long as the comment is deleted or
otherwise altered before the search engine indexes the page, they have
gained nothing for their trouble.  Care must be taken that legitimate
links are not disabled:  Bloggers like their pagerank and this reduces
the effectiveness of search engines by prohibiting them from indexing
pages that may only be mentioned in weblog comments.

A permanent solution to weblog spam would be one that requires the
comment spammer to expend more resources to send the spam than they can
gain through spamming while minimising the amount of resources that the
weblog owner needs to expend.  These resources include time, money and
compute cycles.

Users need to feel that they can trust the solution.  Two examples from
the e-mail domain are MAPS RBL and IronPort BondedSender.  In the case
of MAPS RBL, Paul Vixie blocked journalists who published articles that
he considered to be "pro spam" or "anti MAPS."  In the case of
BondedSender, some users incorrectly assume that IronPort is publishing
a list of organisations who have paid them for the privilege of spamming.

The solution should not interfere with legitimate discussions.  Delaying
the posting of legitimate comments means that users can not easily
engage in discussions.

To summarize, I think that a permanent solution to weblog spam needs to:

1.  Eliminate the benefits of spamming (boost in pagerank).
2.  Not eliminate the benefits of linking in legitimate comments.
3.  Require minimal maintenance.
4.  Be accountable and trustworthy to the user.
5.  Not disrupt or delay legitimate comments.

I suspect that much of SpamAssassin's PMC is on vacation or otherwise
occupied right now.  Once people are back and I can get some +1s, I
think that a good start for this project would be to use the wiki to
critically analyse the various anti-blogspam offerings to identify their
strengths and weaknesses.

I have managed to almost kill all my blog spam using SpamAssassin and
Jay's BL rules which I converted to SA3. I don't have a good database
yet but it is getting better. I also haven't tapped the half of SA's
capability. I would love to test it on a heavy hit site though, to see
how it scales.


It may be interesting to roll out a web-oriented version of
SpamAssassin's rule set with the e-mail specific rules replaced by
web-specific rules and the scores optimised accordingly.  Something like
this could also be useful for wikis and other open-submission web
applications.

As a motivator to get this up and running quickly, we might consider
putting together a paper submission for this year's CEAS conference
(http://www.ceas.cc).  The deadline is March 16th for an extended
abstract or full paper.

Henry

signature.asc
Description: OpenPGP digital signature

Re: RFC: New subproject, BlogSpamAssassin

Reply via email to