|
Bill, so does it make sense or does it not? :) John did point out the basis for why ANTI filters are necessary with his example, but for the sake of all, I would like to expand on this, especially since it took me a bit of time before I came to the conclusion to use such an architecture on many of these filters. 1) Excluding sources of false positives increases the filter's reliability and therefore the scoring that you can apply to it in a weighted environment. There's a big difference between what I score any test that has a success rate of 99% and 99.9%, and the closer to 100% the better. The Y!DIRECTED filter for instance is a prime example of this. It makes use of many counterbalances in order to protect from potential sources of false positives, and with these counterbalances, I feel that it is safe to score that filter at just above my fail weight and let negative tests take it down in the event of an FP (which I have yet to see with the current configuration). Without the counterbalances, the test would probably end up scoring about double the number of messages it does now, and surely that would increase my overall FP rate even if scored less than half of what it is now. Our first job of course is to deliver the good mail, not just block the bad. I will soon start to share ANTI filters for the BASE64 and BADHEADERS tests for similar reasons. I hope this helps you and others understand the methodology a little better. I'm always welcome to whatever feedback you or others might have. John's suggestion for counterbalancing for parts in the GIBBERISH filter for instance led me to create a long and quite useful list of terms indicative of legitimate gibberish which significantly strengthens the filter and stopped me from dropping the recommended scoring (at least for now). Matt Bill Landry wrote: Okay, that makes sense. Thanks for the explanation, John. However, I'm just wondering if it makes sense in a weighted environment to worry about the additional overhead of also processing each message through the "anti" file, especially if you are only applying a relatively low weight to messages that are tagged by the filter? I guess if a message is close to reaching the a hold weight, the "anti" weight could help it to get delivered, but I guess that would need to be weighed against the added work-load, especially when it comes to body checks.Bill ----- Original Message ----- From: "John Tolmachoff (Lists)" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Tuesday, October 28, 2003 8:41 AM Subject: RE: [Declude.JunkMail] GIBBERISH v1.0.5 available, plus new versions of others Bill, I will see if I can explain it. In the GIBBERISH, it lists codes. Well, part numbers sometimes included letter codes. So the legit e-mail that has a letter part number may be caught by GIBBERISH. Therefore, in ANTIGIBBERISH, you included the safe word, part. However, now every message that does not fail GIBBERISH but includes the work part will fail ANTIGIBBERISH and have weight subtracted. Do not want to do that, so the same safe word part goes into the GIBBERISH. John Tolmachoff Engineer/Consultant/Owner eServices For You-----Original Message----- From: [EMAIL PROTECTED] [mailto:Declude.JunkMail- [EMAIL PROTECTED]] On Behalf Of Bill Landry Sent: Tuesday, October 28, 2003 8:28 AM To: [EMAIL PROTECTED] Subject: Re: [Declude.JunkMail] GIBBERISH v1.0.5 available, plus new versions of others Matt, I'm trying to understand the logic behind including content you don't want to block in both the capture and anti-capture files. Why add the extra processing required to parse both files to tag some content and then tag it again with the same negative weight, just so it can end up with a zero weight again? If you don't tag it with a positive weight in the first place, you will not need to tag it again with a negative weight, and you still have the same end result, content that has zero weight applied to it, and you've saved the CPU cycles. I must be missing something here...? Bill ----- Original Message ----- From: "Matthew Bramble" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Tuesday, October 28, 2003 5:14 AM Subject: [Declude.JunkMail] GIBBERISH v1.0.5 available, plus new versions of othersOk, the GIBBERISH filter is finally updated. It has a bunch of tweaks from the time when I last shared the file publicly. Among the changes are exclusions for mail clients by way of either their behaviors or by way of some identifier, additional word and acronym exclusions a couple of which are pretty common (QTR for instance), and something that I believe makes the filter much less apt to false positive on E-mail, it's a group of words that are indicative of auto-generated codes, from part numbers to passwords. The trick with the last part is that while a customer number for instance probably won't include one of the offending strings that this filter looks for, it's indicative of the type of message that has a much higher chance of becoming an FP on this test. I'm expecting with this change that I will see far fewer FP's, which I base on a cursory review of a month's worth of monitoring (though not always attentively). Although adding the "Auto-generated Codes" counterbalances to the GIBBERISHSUB filter isn't going to make a big impact, I went ahead and included it anyway just to be safe, and added a few additional word exclusions. I also added an additional method of detecting forwarded/attached messages to Y!DIRECTED in order to further prevent the possibility of FP'ing. So, all three filters were updated this morning. MailPure :: Filter Software :: Declude Filters http://www.mailpure.com/software/decludefilters/ Please keep reporting bugs when you find them. I'll work on updating the DYNAMIC filter next. Enjoy, Matt --- [This E-mail was scanned for viruses by Declude Virus (http://www.declude.com)] --- This E-mail came from the Declude.JunkMail mailing list. To unsubscribe, just send an E-mail to [EMAIL PROTECTED], and type "unsubscribe Declude.JunkMail". The archives can be found at http://www.mail-archive.com. |
- [Declude.JunkMail] GIBBERISH v1.0.5 available, p... Matthew Bramble
- Re: [Declude.JunkMail] GIBBERISH v1.0.5 ava... Bill Landry
- RE: [Declude.JunkMail] GIBBERISH v1.0.5... John Tolmachoff \(Lists\)
- Re: [Declude.JunkMail] GIBBERISH v1... Bill Landry
- Re: [Declude.JunkMail] GIBBERIS... Matthew Bramble
- Re: [Declude.JunkMail] GIB... Bill Landry
- [Declude.JunkMail] GIBBERISH v1.0.5 availab... Serge
