|
Looks like the first try to send was also blocked, so this time the
offensive content is going in a zip. Here's the repost: I collected a list representing examples of this obfuscation technique. The first set is all from one spammer (the pill guy that has a huge volume of crud spam hitting everyone, and sometimes it does get through based on how clean his IP is): (see zip file) The second set is from other randomized crap spam. There are various techniques used here: (see zip file) I would hate to target just one spammer with a heavy filter (necessary in order to help protect from FP's), and certainly you can't tag all of this stuff. One of my thoughts would be to just look for non-english characters, and strings with a letter then only certain special characters and then another letter, and score low. The only problem is that 26 x 26 = 676 combinations for just one special character three character combo. So some system of limiting the letter choices would be wise, for instance, you could limit the strings to just the 15 most popular letters and eliminate doubles, which would be only 225 combinations per special character, and then choose just 5 or so special characters. On a subject search, that should be doable. Any volunteers for finding the 15 most popular letters? I'll be happy to code it up with a little help. BTW, spammers using the first type of word obfuscation are also quite likely to use other types, and fail tests like GIBBERISH, GIBBERISHSUB, OBFUSCATION, DYNAMIC, FOREIGN, Y!DIRECTED, etc. Very little of this stuff gets through our filters because these filters do such a good job at crud detection. Matt Kami Razvan wrote:
|
Subject_Randomization.zip
Description: Zip compressed data
