[ietf-dkim] Bayesian filters are the pits

Hallam-Baker, Phillip Tue, 22 Aug 2006 13:00:36 -0700

I have been looking through some of the responses from the spamming community to SPF. I conclude that the real problem here is that people using naive Bayesian filtering don't have a clue.

The problem with the Bayesisan approach is that it is very vulnerable to counter-programming by spammers. So when SPF started to gain traction the spammers realized that they could deter adoption of SPF by simply introducing SPF data into their systems. After a short while the naive Bayesian schemes would quickly generate a large negative score for having SPF data present.

The solution to this problem is mostly marketting communications rather than technical.

First we need to get across the fact that spam filtering companies do not in general use the naive Bayesian filtering approaches popularized by Paul Graham and promoted by the conference at MIT.

Second we need a simple fix to deter the 'jamming' attack by spammers. The simple fix here is to simply have a rule that says that certain featues can never result in negative scores. SPF/Sender-ID and DKIM should be amongst them.

Third we need to promote the idea that you should not look for the existence or even the validity of a DKIM header as being as important as the domain that is claiming responsibility. If you can't correlate the domain to some form of additional information you should ignore the record entirely.

_______________________________________________
NOTE WELL: This list operates according to 
http://mipassoc.org/dkim/ietf-list-rules.html

[ietf-dkim] Bayesian filters are the pits

Reply via email to