Here's how it works. Like the Gibberish subject test, this searches for strings of characters not found commonly in communications. Since Base64 encoding has to be scanned with text filters at this time, the filter will automatically trip on any Base64 content because of how common strings with Q are in the encoding. In order to offset this effect, it searches for "attachment;" which is required for any non-inline content, and gives back points. Since this code isn't associated with inline Base64 content, it won't get tripped there and has the net effect of acting just like Declude's BASE64 test. If you test this out, you are advised to reduce the score of BASE64 by the exact score of this test. Again, this test gets tripped by all attachments, but it doesn't change their score. I've found that inline BASE64 only accounts for less than 20% of the hits.
If you don't use BASE64 test because of foreign languages or other similar issues, that test can be scored negatively in order to offset the effects of the inline detection by this filter so that only displayable text and HTML will produce a change in score. That includes non-displayable gibberish text in brackets.
False positives are bound to happen, however their occurrence is fairly low. Since HTML code is also searched, it will find matches in some URL's, especially ones with a tracking capability such as those used by Yahoo! Groups (in the ad sent with listserv postings) and Buy.com, and even less often it will find a match in regular wording, primarily with the use of acronyms.. I'm very interested in hearing about more FP's if you find them.
The filter is designed to be used with v1.75 of declude without the decoding turned off (default on). It can be modified to work with older versions of Declude by changing the "attachments;" offset to "base64" in which case it won't detect any Base64 unless it is not appropriately tagged (useful).
I think this is a killer test. Enjoy.
Matt
# GIBBERISH # Last Update: 09/12/2003 # # Description: # Finds gibberish in the body of the message, including comment blocks. Will be triggered on # any Base64 encoding due to how common Q combinations are. A negative weight for attachments # defeats the test, however inline base64 encoded content will receive full scoring. The BASE64 # test should be reduced by the score of this test in order to compensate for this fact. # # Usage: # GIBBERISH filter C:\IMail\Declude\Gibberish.txt x 5 0 # # False Positives # Will result primarily from URL's containing random looking strings. Known offenders include # Buy.com and Yahoo! Groups.
# The following defeats the test if it finds an attachment. BODY -5 CONTAINS attachment; # Small list of letter combinations not found in a basic dictionary. BODY 0 CONTAINS qb BODY 0 CONTAINS qc BODY 0 CONTAINS qd BODY 0 CONTAINS qf BODY 0 CONTAINS qg BODY 0 CONTAINS qh BODY 0 CONTAINS qi BODY 0 CONTAINS qj BODY 0 CONTAINS qk BODY 0 CONTAINS qm BODY 0 CONTAINS qn BODY 0 CONTAINS qo BODY 0 CONTAINS qp BODY 0 CONTAINS qr BODY 0 CONTAINS qs BODY 0 CONTAINS qt BODY 0 CONTAINS qv BODY 0 CONTAINS qx BODY 0 CONTAINS qy BODY 0 CONTAINS qz BODY 0 CONTAINS vq BODY 0 CONTAINS wq BODY 0 CONTAINS tq BODY 0 CONTAINS jq BODY 0 CONTAINS xd BODY 0 CONTAINS xj BODY 0 CONTAINS xk BODY 0 CONTAINS xr BODY 0 CONTAINS xz BODY 0 CONTAINS zb BODY 0 CONTAINS zc BODY 0 CONTAINS zf BODY 0 CONTAINS zj BODY 0 CONTAINS zk BODY 0 CONTAINS zm BODY 0 CONTAINS zx