I think that I am working towards a rather broader critique of the way that SpamBayes &ct. are applied.
Naïve Bayesian learning schemes are intrinsically vulnerable to counter-programming. They work on a small scale only because there is not a sufficient value to counter-programming. I am reminded of the chess match between Kasparov and Deep Blue. One of the Professors at MIT who works on computer chess told me that they could have taught Kasparov how to outwit the machine by exploiting weaknesses in the computer strategy. In general any naïve learning approach can be intentionally taught to identify a certain characteristic as a strong indicator of spam by an attacker. Once the attacker can control the learning system state there is no end to the tricks that can be played. The common theme at the MIT conference is that the way you test an anti-spam measure is against a static test corpus. What is left unmeasured is the resistance to counter-programming. I believe that what opponents of the DKIM approach describe as a vulnerability of DKIM is in fact an intrinsic weakness of the spam filtering techniques described and that the DKIM exploit is merely one example of a much wider class of attacks against those schemes. This objection is not coming from large scale anti-spam filtering operations, it is coming from people who run spam assasin on their personal email file and take a look at the rules their system is building. > -----Original Message----- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of J.D. Falk > Sent: Tuesday, August 22, 2006 7:41 PM > To: [email protected] > Subject: Re: [ietf-dkim] Bayesian filters are the pits > > On 2006-08-22 12:56, Hallam-Baker, Phillip wrote: > > > Third we need to promote the idea that you should not look for the > > existence or even the validity of a DKIM header as being as > important > > as the domain that is claiming responsibility. If you can't > correlate > > the domain to some form of additional information you should ignore > > the record entirely. > > That's generally true in a simplistic spam / not spam > decision. If you're making a forged / not forged decision, > the record is still useful. > > This has nothing to do with naive Bayes, but everything to do > with naive mail administrators looking for simple binary spam > / not spam criteria. > > -- > J.D. Falk, Anti-Spam Product Manager > Yahoo! Communications Platform Team > _______________________________________________ > NOTE WELL: This list operates according to > http://mipassoc.org/dkim/ietf-list-rules.html > > _______________________________________________ NOTE WELL: This list operates according to http://mipassoc.org/dkim/ietf-list-rules.html
