|
How about 4 different super tests? I fail automatically on
=?ISO-8859-1?B?, and that accounts for more than 1% of the E-mail
coming in to my server, but only a handful of additional catches in
what was being missed...no false positives. I think I've mentioned
enough times, the other tests that I would like to have...a BODYTEXT
filter that searches just a decoded non-HTML body, a NOTEXT test for
nothing but spaces and returns and attachments (that's a key) after
decoding and de-HTMLifying, and a TEXTCOUNT marquee test that would
allow you to search for amounts of non-HTML decoded body text just just
like SUBECTSPACES and BCC, but in reverse (the less there is, the
higher the score). I could catch so much crap with those 40 or so two
character gibberish strings, in fact I think it was properly tagging
around 10% to 20% of all unique incoming messages today if not more.
That gibberish subject filter is tagging over 5% by itself, and with
perfect accuracy so far. A functional gibberish body filter though
would have a reasonable number of false positives (was tagging buy.com
links that were shown in displayable text for instance). I don't of
course though expect Scott to rush to my aid here. I have managed to add though tests for SUBECTSPACES (very effective), COMMENTS (effective) and BCC (just ok), along with some small key word/phrase filters for the body, subject and sender with very good success. I only saw about 5 definitive false positives today out of around 3000 unique messages, but approximately 150 pieces of spam got through. I think that could be reduced by as much as half without a measurable impact on the false positives. If that doesn't work, I'm buying a gun :) BTW, on Linux, my guru buddy recommends Postfix as the SMTP client and Webmin as the interface. I don't though dispute Sandy's faith in MS SMTP, and it can be run on the same box as IMail. Matt Dan Patnode wrote: FYI, I pulled this test 3 weeks ago after a email from France came through (or rather didn't) with this subject:Subject: =?ISO-8859-1?B?RW5qb3kgc3VtbWVyIHVudGlsIGl0cyB2ZXJ5IGVuZCE=?= There's definitely is a correlation here among spammers, ?B? encoded subjects, disposable domain names, and nothing else in the body of the message. There has to be a way to bring the 2 or 3 variables togther as a super test. Dan On Monday, September 8, 2003 19:05, Matthew Bramble <[EMAIL PROTECTED]> wrote: |
- Re: [Declude.JunkMail] Foreign T... Matthew Bramble
- [Declude.JunkMail] AUTOWHITELIST Sheldon Koehler
- Re: [Declude.JunkMail] AUTOWHITE... Sheldon Koehler
- Re: [Declude.JunkMail] Strange S... R. Scott Perry
- [Declude.JunkMail] Article on Ne... Sean Fahey
- RE: [Declude.JunkMail] Article o... Andy Schmidt
- RE: [Declude.JunkMail] Article o... Kevin Bilbee
- RE: [Declude.JunkMail] Strange Subject Markus Gufler
- Re: [Declude.JunkMail] Strange Subject Matthew Bramble
- Re: [Declude.JunkMail] Strange Subject Dan Patnode
- Re: [Declude.JunkMail] Strange Subject Matthew Bramble
- Re: [Declude.JunkMail] Strange Subject Dan Patnode
- Re: [Declude.JunkMail] Strange Subject R. Scott Perry
- Re: Re: [Declude.JunkMail] Strange Subject Doug McKee
- Re: [Declude.JunkMail] Strange Subject Matthew Bramble
- Re: [Declude.JunkMail] Strange Subject Dan Patnode
- Re: [Declude.JunkMail] Strange Subject Matthew Bramble
- RE: [Declude.JunkMail] Strange Subject Marc Catuogno
- Re: [Declude.JunkMail] Strange Subject Matthew Bramble
- Re: [Declude.JunkMail] Strange Subject Dan Patnode
- Re: [Declude.JunkMail] Strange Subject Matthew Bramble
