I have a reason for my bragging here in case anyone is wondering. The reason is - the URI lists are no longer my best trick. It's now second best - maybe even third.
My best trick - at least for message content is to use a second bayesian filter - I'm using spamprobe - and feeding it custom tokens - not the whole message.
First - I enhance the headers using Exim to include a lot of DNS lookups to get zone info in the parent of the class C IP block of the sender - MX records etc. Fatten up the headers with anything that would tend to fingerprint the message.
Then - I discard most all of the message body leaving only URI's any references to filenames, phone numbers, and email addresses. I also make tokens out of every combination of two header names and add them as if they were message lines. Then I feed it all into Spamprobe for scoring using a central database for all users. Spamprobe adds a header with 11 levels - 5 spam - 5 ham - and neutral. But the scoring really hugs the ends well. These are scored by SA allong with everything else.
The results give SA an order of magnatude better accuracy. In particular - it tends to catch 419 spam. It tends to save commercial nonspam like real bank statements and airline ticket confirmations. It is totally immune to spam the addes words to defeat bayes because i discard the body. nd it learns extremely fast. One manual message from a nonspam to retrain and that source becomes nonspam forever.
The central database is fairly small because I'm stripping out the body.
The second bayes filter is trained on the same data as SA internal bayes filter. I key on the autolearn tag coming out of SA.
My point - it took over a year to convince you all that looking at the links was a good idea. Now I have another good idea that actually works. Using a second bayes filter that only looks at the hot parts of the message - the headers and such - and ignores the cooler parts - thebody - really works. Been running it for over a month now.
It is my hope - like last time - that you smarter people will test this - do it right - and make it far better that my crude spamprobe kludge. And if someone has a history of messages from years ago you can go back before URI blacklists and verify that I was pushing that for a long time before you all caught on.
My filter right now is so accurate that I could run an open relay and no one would know I was doing it. So - hope I got your attention this time.
Sidney Markowitz wrote:
Daniel and SpamAssassin are on Slashdot!
http://it.slashdot.org/article.pl?sid=05/03/04/2010218&tid=111
-- sidney
-- Marc Perkel - [EMAIL PROTECTED]
Spam Filter: http://www.junkemailfilter.com My Blog: http://marc.perkel.com My Religion: http://www.churchofreality.org ~ "If it's real - we believe in it!" ~
