RE: [ietf-dkim] Bayesian filters are the pits

Hallam-Baker, Phillip Wed, 23 Aug 2006 07:51:48 -0700

I think that I am working towards a rather broader critique of the way that 
SpamBayes &ct. are applied.

Naïve Bayesian learning schemes are intrinsically vulnerable to 
counter-programming. They work on a small scale only because there is not a 
sufficient value to counter-programming.

I am reminded of the chess match between Kasparov and Deep Blue. One of the 
Professors at MIT who works on computer chess told me that they could have 
taught Kasparov how to outwit the machine by exploiting weaknesses in the 
computer strategy.

In general any naïve learning approach can be intentionally taught to identify 
a certain characteristic as a strong indicator of spam by an attacker. Once the 
attacker can control the learning system state there is no end to the tricks 
that can be played.

The common theme at the MIT conference is that the way you test an anti-spam 
measure is against a static test corpus. What is left unmeasured is the 
resistance to counter-programming.

I believe that what opponents of the DKIM approach describe as a vulnerability 
of DKIM is in fact an intrinsic weakness of the spam filtering techniques 
described and that the DKIM exploit is merely one example of a much wider class 
of attacks against those schemes.

This objection is not coming from large scale anti-spam filtering operations, 
it is coming from people who run spam assasin on their personal email file and 
take a look at the rules their system is building.

> -----Original Message-----
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] On Behalf Of J.D. Falk
> Sent: Tuesday, August 22, 2006 7:41 PM
> To: [email protected]
> Subject: Re: [ietf-dkim] Bayesian filters are the pits
> 
> On 2006-08-22 12:56, Hallam-Baker, Phillip wrote:
> 
> > Third we need to promote the idea that you should not look for the 
> > existence or even the validity of a DKIM header as being as 
> important 
> > as the domain that is claiming responsibility. If you can't 
> correlate 
> > the domain to some form of additional information you should ignore 
> > the record entirely.
> 
> That's generally true in a simplistic spam / not spam 
> decision.  If you're making a forged / not forged decision, 
> the record is still useful.
> 
> This has nothing to do with naive Bayes, but everything to do 
> with naive mail administrators looking for simple binary spam 
> / not spam criteria.
> 
> --
> J.D. Falk, Anti-Spam Product Manager
> Yahoo! Communications Platform Team
> _______________________________________________
> NOTE WELL: This list operates according to 
> http://mipassoc.org/dkim/ietf-list-rules.html
> 
> 

_______________________________________________
NOTE WELL: This list operates according to 
http://mipassoc.org/dkim/ietf-list-rules.html

RE: [ietf-dkim] Bayesian filters are the pits

Reply via email to