Do you this in addition to or in replace of the tested listed earlier. GibberishSub.txt
----- Original Message ----- From: "Matthew Bramble" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Friday, September 12, 2003 2:41 PM Subject: [Declude.JunkMail] Gibberish body detector + inline Base64 > I've been testing this for almost a day and have had very good results > with this filter as it is catching spam all the time...over 1/3 of my > total mail volume is being tagged in fact. > > Here's how it works. Like the Gibberish subject test, this searches for > strings of characters not found commonly in communications. Since > Base64 encoding has to be scanned with text filters at this time, the > filter will automatically trip on any Base64 content because of how > common strings with Q are in the encoding. In order to offset this > effect, it searches for "attachment;" which is required for any > non-inline content, and gives back points. Since this code isn't > associated with inline Base64 content, it won't get tripped there and > has the net effect of acting just like Declude's BASE64 test. If you > test this out, you are advised to reduce the score of BASE64 by the > exact score of this test. Again, this test gets tripped by all > attachments, but it doesn't change their score. I've found that inline > BASE64 only accounts for less than 20% of the hits. > > If you don't use BASE64 test because of foreign languages or other > similar issues, that test can be scored negatively in order to offset > the effects of the inline detection by this filter so that only > displayable text and HTML will produce a change in score. That includes > non-displayable gibberish text in brackets. > > False positives are bound to happen, however their occurrence is fairly > low. Since HTML code is also searched, it will find matches in some > URL's, especially ones with a tracking capability such as those used by > Yahoo! Groups (in the ad sent with listserv postings) and Buy.com, and > even less often it will find a match in regular wording, primarily with > the use of acronyms.. I'm very interested in hearing about more FP's if > you find them. > > The filter is designed to be used with v1.75 of declude without the > decoding turned off (default on). It can be modified to work with older > versions of Declude by changing the "attachments;" offset to "base64" in > which case it won't detect any Base64 unless it is not appropriately > tagged (useful). > > I think this is a killer test. Enjoy. > > Matt > > > ---------------------------------------------------------------------------- ---- > # GIBBERISH > # Last Update: 09/12/2003 > # > # Description: > # Finds gibberish in the body of the message, including comment blocks. Will be triggered on > # any Base64 encoding due to how common Q combinations are. A negative weight for attachments > # defeats the test, however inline base64 encoded content will receive full scoring. The BASE64 > # test should be reduced by the score of this test in order to compensate for this fact. > # > # Usage: > # GIBBERISH filter C:\IMail\Declude\Gibberish.txt x 5 0 > # > # False Positives > # Will result primarily from URL's containing random looking strings. Known offenders include > # Buy.com and Yahoo! Groups. > > > > # The following defeats the test if it finds an attachment. > > BODY -5 CONTAINS attachment; > > > # Small list of letter combinations not found in a basic dictionary. > > BODY 0 CONTAINS qb > BODY 0 CONTAINS qc > BODY 0 CONTAINS qd > BODY 0 CONTAINS qf > BODY 0 CONTAINS qg > BODY 0 CONTAINS qh > BODY 0 CONTAINS qi > BODY 0 CONTAINS qj > BODY 0 CONTAINS qk > BODY 0 CONTAINS qm > BODY 0 CONTAINS qn > BODY 0 CONTAINS qo > BODY 0 CONTAINS qp > BODY 0 CONTAINS qr > BODY 0 CONTAINS qs > BODY 0 CONTAINS qt > BODY 0 CONTAINS qv > BODY 0 CONTAINS qx > BODY 0 CONTAINS qy > BODY 0 CONTAINS qz > > BODY 0 CONTAINS vq > BODY 0 CONTAINS wq > BODY 0 CONTAINS tq > BODY 0 CONTAINS jq > > BODY 0 CONTAINS xd > BODY 0 CONTAINS xj > BODY 0 CONTAINS xk > BODY 0 CONTAINS xr > BODY 0 CONTAINS xz > > BODY 0 CONTAINS zb > BODY 0 CONTAINS zc > BODY 0 CONTAINS zf > BODY 0 CONTAINS zj > BODY 0 CONTAINS zk > BODY 0 CONTAINS zm > BODY 0 CONTAINS zx --- [This E-mail was scanned for viruses by Declude Virus (http://www.declude.com)] --- This E-mail came from the Declude.JunkMail mailing list. To unsubscribe, just send an E-mail to [EMAIL PROTECTED], and type "unsubscribe Declude.JunkMail". The archives can be found at http://www.mail-archive.com.