They're still a work in progress of course, but most of the major sources of FP's seem to have been fixed.

The major changes are that the tests have both been split into two files, on for positives, and one for counterbalancing false positives. This reduces the possibility of crediting too much back to any E-mail. It also makes testing a lot easier as any test that fails the main filter, and doesn't fail the "anti" filter gets scored, those that fail both don't.

The GIBBERISHSUB filter is pretty much there with the only things that I expect to add being exceptions in the ANTIGIBBERISHSUB filter. Those exemptions should be for words, acronyms and stock market symbols, and they should match the same exemptions in ANTIGIBBERISH filter.

The GIBBERISH filter similarly has ANTIGIBBERISH as a counterbalance. Some things are listed in both files if they only occasionally don't tend to throw positives, which makes monitoring easier. The test will no longer interfere with BASE64 except that it will add extra score to any base64 encoded content that isn't tagged anywhere in the headers or message body as being such. This is not a bad thing because that would be very highly indicative of spam. I have also found that many spams are caught because they contain gibberish in the message boundary only. Normal mail clients use time stamps, either in decimal or hexadecimal form so they won't trip the test. Spammers also tend to create fake directories in their links that are made from gibberish, and this will detect that as well, though unfortunately, some legitimate mailers are random enough to get caught and they are being kept track of in the "anti" file.

I haven't had time to massage the comments, but wanted to put this out for testing because it resolves many of the false positives. Please let me know if you have a nomination for counterbalancing measures, such as words, mail clients, bulk mailers, etc. Offending code is helpful because a literal exception might not be the best way around it. For instance, I just too care of a MS Word mail issue by exempting XML tags instead of one particular string of characters.

You can download those filters plus the OBFUSCATION filter at the following locations:


GIBBERISH and ANTIGIBBERISH http://www.mailpure.com/decludefilters/gibberish/Gibberish_09-15-2003.txt http://www.mailpure.com/decludefilters/gibberish/AntiGibberish_09-15-2003.txt


GIBBERISHSUB and ANTIGIBBERISHSUB http://www.mailpure.com/decludefilters/gibberishsub/GibberishSub_09-15-2003.txt http://www.mailpure.com/decludefilters/gibberishsub/AntiGibberishSub_09-15-2003.txt


OBFUSCATION http://www.mailpure.com/decludefilters/obfuscation/Obfuscation_09-14-2003c.txt


Recommendations how to best obscure the files long-term would be appreciated. It shouldn't be anything too convoluted, like maybe a secret handshake or something :)


Matt

---
[This E-mail was scanned for viruses by Declude Virus (http://www.declude.com)]

---
This E-mail came from the Declude.JunkMail mailing list.  To
unsubscribe, just send an E-mail to [EMAIL PROTECTED], and
type "unsubscribe Declude.JunkMail".  The archives can be found
at http://www.mail-archive.com.

Reply via email to