Bill, so does it make sense or does it not?  :)

John did point out the basis for why ANTI filters are necessary with his example, but for the sake of all, I would like to expand on this, especially since it took me a bit of time before I came to the conclusion to use such an architecture on many of these filters.

1) Excluding sources of false positives increases the filter's reliability and therefore the scoring that you can apply to it in a weighted environment.  There's a big difference between what I score any test that has a success rate of 99% and 99.9%, and the closer to 100% the better.  The Y!DIRECTED filter for instance is a prime example of this.  It makes use of many counterbalances in order to protect from potential sources of false positives, and with these counterbalances, I feel that it is safe to score that filter at just above my fail weight and let negative tests take it down in the event of an FP (which I have yet to see with the current configuration).  Without the counterbalances, the test would probably end up scoring about double the number of messages it does now, and surely that would increase my overall FP rate even if scored less than half of what it is now.  Our first job of course is to deliver the good mail, not just block the bad.  I will soon start to share ANTI filters for the BASE64 and BADHEADERS tests for similar reasons.

2) Processing the number of additional lines in these ANTI filters is a non-issue for the vast majority of Declude users because most systems aren't running near capacity, in which case no noticeable delay or other adverse effects are seen.  I also believe that spam fighting is going to become much more processor intensive as products migrate further away from one-to-one matching to many-to-many matching in weighted systems.  It makes sense for me to share filters in the most globally useful manner, however that shouldn't stop administrators from doing their own tweaks, or even cutting out wasted space from the extensive commenting and redundancies that may exist (such as the obfuscation tests in the Y!DIRECTED filter).

3) Placing exceptions in a single filter system creates the distinct possibility of crediting back too many points and without any regard to it being spam or legit.  Some of these filters have such wide counterbalancing measures, such as credits for replies and forwards, which are absolutely necessary for limiting false positives, that it would have an overall detrimental effect to your system if you chose to counterbalance within a single file with multiple exceptions.

4) Using ANTI filters makes scoring much easier (and I'm sure you of all of us would appreciate that).  Determining if one of these filters assessed a score is as simple as seeing if the ANTI filter also got tripped, and there should never be an instance where only the ANTI filter gets tripped.

5) While it's not the utopian way of doing things, it is necessary given the current set of tools.  I would imagine that eventually Scott will give us a way to defeat a filter instead of just subtracting points for hits, and also have the filter not get logged as a hit when that happens.  That would certainly end the need for these ANTI filters except when they are used to counterbalance the tests that come with Declude.

6) And my favorite point...It works for me :)

I hope this helps you and others understand the methodology a little better.

I'm always welcome to whatever feedback you or others might have.  John's suggestion for counterbalancing for parts in the GIBBERISH filter for instance led me to create a long and quite useful list of terms indicative of legitimate gibberish which significantly strengthens the filter and stopped me from dropping the recommended scoring (at least for now).

Matt



Bill Landry wrote:
Okay, that makes sense.  Thanks for the explanation, John.

However, I'm just wondering if it makes sense in a weighted environment to
worry about the additional overhead of also processing each message through
the "anti" file, especially if you are only applying a relatively low weight
to messages that are tagged by the filter?  I guess if a message is close to
reaching the a hold weight, the "anti" weight could help it to get
delivered, but I guess that would need to be weighed against the added
work-load, especially when it comes to body checks.

Bill
----- Original Message ----- 
From: "John Tolmachoff (Lists)" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Tuesday, October 28, 2003 8:41 AM
Subject: RE: [Declude.JunkMail] GIBBERISH v1.0.5 available, plus new
versions of others


Bill, I will see if I can explain it.

In the GIBBERISH, it lists codes. Well, part numbers sometimes included
letter codes. So the legit e-mail that has a letter part number may be
caught by GIBBERISH. Therefore, in ANTIGIBBERISH, you included the safe
word, part. However, now every message that does not fail GIBBERISH but
includes the work part will fail ANTIGIBBERISH and have weight subtracted.
Do not want to do that, so the same safe word part goes into the GIBBERISH.

John Tolmachoff
Engineer/Consultant/Owner
eServices For You


  
-----Original Message-----
From: [EMAIL PROTECTED] [mailto:Declude.JunkMail-
[EMAIL PROTECTED]] On Behalf Of Bill Landry
Sent: Tuesday, October 28, 2003 8:28 AM
To: [EMAIL PROTECTED]
Subject: Re: [Declude.JunkMail] GIBBERISH v1.0.5 available, plus new
versions of others

Matt, I'm trying to understand the logic behind including content you
don't
want to block in both the capture and anti-capture files.  Why add the
extra
processing required to parse both files to tag some content and then tag
it
again with the same negative weight, just so it can end up with a zero
weight again?  If you don't tag it with a positive weight in the first
place, you will not need to tag it again with a negative weight, and you
still have the same end result, content that has zero weight applied to
it,
and you've saved the CPU cycles.

I must be missing something here...?

Bill
----- Original Message -----
From: "Matthew Bramble" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Tuesday, October 28, 2003 5:14 AM
Subject: [Declude.JunkMail] GIBBERISH v1.0.5 available, plus new versions
of
others


    
Ok, the GIBBERISH filter is finally updated.  It has a bunch of tweaks
from the time when I last shared the file publicly.  Among the changes
are exclusions for mail clients by way of either their behaviors or by
way of some identifier, additional word and acronym exclusions a couple
of which are pretty common (QTR for instance), and something that I
believe makes the filter much less apt to false positive on E-mail, it's
a group of words that are indicative of auto-generated codes, from part
numbers to passwords.  The trick with the last part is that while a
customer number for instance probably won't include one of the offending
strings that this filter looks for, it's indicative of the type of
message that has a much higher chance of becoming an FP on this test.
I'm expecting with this change that I will see far fewer FP's, which I
base on a cursory review of a month's worth of monitoring (though not
always attentively).

Although adding the "Auto-generated Codes" counterbalances to the
GIBBERISHSUB filter isn't going to make a big impact, I went ahead and
included it anyway just to be safe, and added a few additional word
exclusions.  I also added an additional method of detecting
forwarded/attached messages to Y!DIRECTED in order to further prevent
the possibility of FP'ing.

So, all three filters were updated this morning.

   MailPure :: Filter Software :: Declude Filters
   http://www.mailpure.com/software/decludefilters/

Please keep reporting bugs when you find them.  I'll work on updating
the DYNAMIC filter next.

Enjoy,

Matt

--- [This E-mail was scanned for viruses by Declude Virus (http://www.declude.com)] --- This E-mail came from the Declude.JunkMail mailing list. To unsubscribe, just send an E-mail to [EMAIL PROTECTED], and type "unsubscribe Declude.JunkMail". The archives can be found at http://www.mail-archive.com.

Reply via email to