While reading the article by Paul Graham, something came to my mind; what will happen if the user is not english?
Let's say 99.99% of all spam is in english (which is my experience), and my mother tongue is norwegian. ;) Let's also say that I usually never receive mails written in english. The Bayesian approach would then put all english words in a bad-words list (except words found in headers), and all norwegian words in a good-words list, wouldn't it? 1. What happens the day I join an english mailing list, or receive a mail written in english? 2. What happens if I receive a mail written in norwegian but containing a few english words, i.e. quoting someone? I'd say it would discard mail #1, but let through #2... What do you think..? :) On Sat, 2002-08-17 at 23:37, Chuq Von Rospach wrote: > On 8/17/02 12:37 PM, "J C Lawrence" <[EMAIL PROTECTED]> wrote: > > > Keep thinking about it. In essence it is a merely a finer grained > > scoring system. It doesn't fundamentally change the spam cold war; > > Actually, I think it does fundamentally change it. You're not just making > better guesses at what spammers say. You're effectively building a digital > signature of what your REAL mail looks like, and comparing messages to it. > The further it deviates from your real mail, the spammier it is. > > The only two ways for spammers to avoid this are to move to graphics, which > can still be whacked on, and to stop using open relays and other things that > leave noticable signatures in the headers. > > It might not catch the smartest spam, but it'll sure catch everything else.
signature.asc
Description: This is a digitally signed message part