I've been receiving a steady stream of spam with the following body format:
[3 centered images, referenced by URLs] [A short line of random characters, or sometimes a <style> tag] [Random_character_string][a random word][2nd_random_string] Last element repeated many times .... There will be several hundred lines starting with the the same 40 to 80 Random_character_string, following by a random word, different in each line, followed by a consistent, repeated 2nd_random_string. The repeated strings are different for each spam, but consistent throughout each one. Occasionally a 2nd similar line with different random elements is inserted, but the repeated portions of most lines are frequent enough to identify the spam. I've been successfully using a filter on my personal mail to sideline these via dynamic delivery instruction, and have generalized this to a python filter module, attached. The header and match information isn't returned in the SMTP dialog, for security reasons, but you can uncomment the line invoking syslog in the gibDetect method to log more details about the spam. /etc/pythonfilter.conf contains: # gibberish: check message for repetitive giberish lines gibberish /etc/pythonfilter-modules.conf contains: [gibberish.py] maxMsgSize = 2000000 checkLines = 400 gibLines = 40 gibChars = 10 These are explained in the module code itself, but basically, for any email smaller than maxMsgSize the module examines the the first 400 lines and looks for gibLines consecutive lines starting with the same gibChars characters. Gordon, take a look at this code and if you have any suggestions please post them. -- Lindsay Haisley | "UNIX is user-friendly, it just FMP Computer Services | chooses its friends." 512-259-1190 | -- Andreas Bogk http://www.fmp.com |
#!/usr/bin/python # vim: set expandtab ai ts=4: import sys import os.path import courier.config import courier.control import courier.xfilter import syslog as S maxMsgSize = 2000000 # Maximum message size. Pass if larger. checkLines = 100 # Number of lines (including headers) to check for repetitive gibberish gibLines = 40 # Number of consecutive gibberish lines required for rejection gibChars = 10 # Number of characters to check in each line for repetitive gibberish def initFilter(): courier.config.applyModuleConfig('gibberish.py', globals()) # Record in the system log that this filter was initialized. sys.stderr.write('Initialized the "gibberish" python filter\n') def gibDetect(bf): a = [] bfh = open(bf) for i in range(checkLines): a.append(bfh.readline()) lfcount = 0 lcount = 0 lastlf = '' subject = '' for l in a: if not subject: if l[:8] == "Subject:": subject = l[9:] continue lf = l[:gibChars] if lf == lastlf and len(lf) == gibChars and not " " in lf: lfcount += 1 if lfcount >= gibLines: # S.syslog(S.LOG_INFO | S.LOG_MAIL, "gibberish: %s: match: %s" % (subject, lastlf)) return ("gibberish: %s" % subject) else: lastlf = lf lfcount = 0 return None def doFilter(bodyFile, controlFileList): msgSize = os.path.getsize(bodyFile) if msgSize > maxMsgSize: return '' n = gibDetect(bodyFile) if n: sender = courier.control.getSendersMta(controlFileList) return "500 gibberish spam from %s" % sender return ''
------------------------------------------------------------------------------
_______________________________________________ courier-users mailing list courier-users@lists.sourceforge.net Unsubscribe: https://lists.sourceforge.net/lists/listinfo/courier-users