Re: [Declude.JunkMail] Filtering Question...

Chuck Schick Mon, 15 Dec 2003 21:58:55 -0800

Matt:

Thanks for your insight.  I have been trying for two years to get in Front of the Spam 
curve but have found it to be an ever changing landscape which is hard to stay on top 
of.  We have seen our Spam load increase at least 10 fold in the past two years.  The 
challenge is that we have seen our legitimate email customers increase significantly 
also in that period of time and I feel the number one objective is to deliver the 
legitimate mail to them.


Every time we add a spam test it also increases the false positives.  It has gotten to 
the point where we need to counterweight some of the known issues.  I prefer a 
counterweight (negative filter value) to out and out whitelisting.  I believe 
whitelisting by email address or domain should be a last resort.

I agree with much of what you have stated (the parts I do not fully agree with are 
simply because I have not fully studied it yet).  Programmatic filtering we have been 
using Spamchk for two months now and have been very happy with the results - it has 
probably moved us to the high 90% in eliminating spam.  

One thing I see as that certain test cause more false positives than others.  
Spamdomains is an example of a test that I am strongly thinking of dropping - it 
probably causes more false positives than any other tests.  Too many times people 
sending legitimate emails use a reply to address that is not the same domain as they 
are sending from.  So I would like to use more programmatic filtering and 
counterbalances to get 99% rejection (we are there) and less than .3 % FP - (we are 
not there).


Chuck Schick


---------- Original Message ----------------------------------
From: Matthew Bramble <[EMAIL PROTECTED]>
Reply-To: [EMAIL PROTECTED]
Date: Mon, 15 Dec 2003 21:52:57 -0500

>Chuck,
>
>There are several different general uses for custom filtering.  The 
>Matt's School of Thought would teach as follows:
>
>1) Programmatic filtering.  This is more like pattern matching with 
>custom filters.  Patterns can be as simple as the country of origin, or 
>more complex like gibberish inserted into spam in order to throw off 
>some products.  These filters can be highly effective at targeting crud 
>spammers, even when they find a perfectly clean IP address.  These guys 
>often try multiple types of obfuscation in each message, and it's the 
>techniques that give them away instead of the content.  You can download 
>a bunch of filters from my site, 
>www.mailpure.com/software/decludefilters/ , and search the archives for 
>versions of OBFUSCATION, DYNAMIC, PEXICOM, FORGEDHELO-IP, 
>FORGEDHELP-FDQN, FORGEDASLOCAL, SPAMDOMAINS, and last week's "New fraud 
>exploit".  There are other examples as well that appear now and then.
>
>2) Banned words list.  These should be scored fairly low, but some words 
>are highly indicative of spam, for instance the various drugs that are 
>advertised, or terms related to sex, printer cartridges, anti-virus 
>products, fraud and scams, etc.  You can categorize these in one single 
>file, and score each entry independently.  You can also add words to the 
>list as you discover false negatives that get through your system.  This 
>need not be a very large list, in fact I make due quite well with maybe 
>50 such entries, though I could pay a bit more attention to it.  
>Spammers will obfuscate problematic words, which means that the entries 
>themselves may cause more FP's than P's.
>
>3) Pseudo-whitelist.  This is a very useful file to have in order to 
>mitigate the effects of false positives from tests.  Every system out 
>there makes a subconscious attempt to deem what a normal score is, and 
>it's not necessary to counterbalance every last point that might be 
>scored from every last test...otherwise we would be blocking on every 
>RBL and whitelisting with every filter.  I really don't get concerned 
>about false positives on E-mails until they start to score consistently 
>at 70% of my fail weight, and then I take action on them by listing them 
>in this filter.  My pseudo-whitelist is much larger than my own 
>blocklist because I add a listing to it every time I encounter a false 
>positive as a result of an RBL or external test.  I do differentiate 
>between responsible bulk mailers, direct senders, and those that come 
>from neither.
>
>4) Pseudo-blacklist.  This is mostly what Kami has done by building a 
>list of identifiers for what he considers to be spam.  In many cases he 
>lists multiple types of information, probably in the off chance that one 
>piece changes, but the others remain trackable.  The downside of 
>tracking multiple pieces is that FP's can occur with multiple elements.  
>I personally keep two filters for this use, one is IP based (uses IPFILE 
>functionality) and the other is based on a range of things, it all 
>depends on what I deem as a reliable identifier, but I group them by 
>identifier.  If I consider a source to be spam and its not he crud type 
>of spam that comes from open relays or zombied machines (so it can be 
>tracked by way of some identifier where that type will even throw away 
>domains after a few days), then I throw it in that file.  I don't add a 
>lot of this stuff because most of the static spammers tend to be well 
>blocked by the RBL's, though I must block something if a customer asks 
>me to.  This becomes resource intensive if your file(s) grow too large 
>and can be hard to maintain, i.e. how do you expire listings.
>
>Now as far as the pros and cons of using a particular data element for 
>pseudo-whitelisting goes, you want to use the hardest to spoof piece of 
>data that is reliable.  The IP is the hardest, but it is rarely tracked 
>due to the difficulty in maintaining this information, REVDNS is the 
>next best, however it is sometimes spoofed with major ISP's and 
>ecommerce sites.  Data elements like HELO and MAILFROM are easily and 
>often spoofed, and should be used as a last resort.  You might even be 
>forced to use HEADERS to search for an address that appears as the from, 
>but not the MAILFROM, or in the event that you are counterbalancing an 
>external test such as Message Sniffer, you might need to list URL's in a 
>BODY filter since they will often track such things, and while you might 
>get something through originally with a REVDNS counterbalance, a reply 
>or forward of the same content could still trip Sniffer based on the 
>content of the message.
>
>A recent issue highlights the decision making process required for 
>pseudo-whitelisting.  I had a FP reported to me from a pay site that 
>sends out daily newsletters.  This company uses a third-party delivery 
>service which has a big problem with spammers and is even listed on SBL, 
>though they also managed to get listed in Bonded Sender (both of which 
>seem inappropriate).  The REMOTEIP, REVDNS, HELO and MAILFROM is from 
>this untrusted third-party, however the From address (which isn't 
>trackable in Declude currently), is unique to this sender, having their 
>domain listed.  So in order to allow them through in a reliable way, I 
>chose a header filter that reads as follows:
>
>    HEADERS      -15      CONTAINS      @some-domain.com>
>
>Most of course though get listed as REVDNS though, and I plan on 
>starting an IPFILE for pseudo-whitelisting trusted bulk mailers, 
>ecommerce companies, and ISP's, primarily because they might be spoofed 
>and this protects from that.  I've never seen an IP spoofed on the last 
>hop, though you have to be very careful about this on multiple hop scanning.
>
>Matt
>
>
>

---
[This E-mail was scanned for viruses by Declude Virus (http://www.declude.com)]

---
This E-mail came from the Declude.JunkMail mailing list.  To
unsubscribe, just send an E-mail to [EMAIL PROTECTED], and
type "unsubscribe Declude.JunkMail".  The archives can be found
at http://www.mail-archive.com.

Re: [Declude.JunkMail] Filtering Question...

Reply via email to