Here's an update on my gibberish.py courier pythonfilter module. This
takes into account the latest metastasis of this form of gibberish spam
in which different random patterns occur on alternate lines. The module
will looks at successive lines, one at a time, and if no two match in
succession, it looks at every other line for matches, and then every
3rd line, etc., up to skipLines lines. If a line is shorter than
gibChars, it eats an extra line and continues with the same skip value.

I think this code is OK, but Gordon might want to pass off on it. I'm
tired, and it took me a long time to get this working :(  But it does
work here.

Comment out the syslog invocation to save log space. This is debugging
information, but may also be useful for automated log analysis.

This code could probably be tightened up considerably. I'm using a
couple of python iterators, and there may (probably) be faster ways to
do this using simple list index value arithmetic.

I HATE spammers!

-- 
Lindsay Haisley       | "UNIX is user-friendly, it just
FMP Computer Services |       chooses its friends."
512-259-1190          |          -- Andreas Bogk
http://www.fmp.com    |

#!/usr/bin/python
# vim: set expandtab ai ts=4:

import sys
import os.path
import courier.config
import courier.control

maxMsgSize = 2000000
# Maximum message size. Pass if larger.

checkLines = 100
# Number of lines (including headers) to check for repetitive gibberish

gibLines = 40
# Number of consecutive gibberish lines required for rejection

gibChars = 10
# Number of characters to check in each line for repetitive gibberish

skipLines = 4
# Number of lines to scan for repetitive duplicates


def initFilter():
    courier.config.applyModuleConfig('gibberish.py', globals())
    # Record in the system log that this filter was initialized.
    sys.stderr.write('Initialized the "gibberish" python filter\n')

def piter(arr,n):
    iterLines = iter(arr)
    iterArr = []
    for foo in iterLines:
        for r in range(n):
            foo = next(iterLines,False)
        if foo and len(foo) >= gibChars:
            iterArr.append(foo)
        else:
            foo = next(iterLines,False)
    return iter(iterArr)

def gibDetect(bf):
    global gLskip
    a = []
    bfh = open(bf)
    for i in range(checkLines):
        a.append(bfh.readline())
    
    lfcount = 0
    lcount = 0
    lastlf = ''
    subject = ''
    
    for l in a:
        if not subject:
            if l[:8] == "Subject:":
                subject = l[9:]
                break

    for lskip in range(skipLines):
        for l in piter(a,lskip):
            lf = l[:gibChars]
            if lf == lastlf and not " " in lf:
                lfcount += 1
                if lfcount >= gibLines:
                    gLskip = lskip
                    return ("gibberish: %s: match: %s" % (subject.rstrip(), lastlf))
            else:
                lastlf = lf
                lfcount = 0
    gLskip = lskip
    return None

def doFilter(bodyFile, controlFileList):
    msgSize = os.path.getsize(bodyFile)
    if msgSize > maxMsgSize:
        return ''

    n = gibDetect(bodyFile)
    if n:
        sender = courier.control.getSendersMta(controlFileList) 
        S.syslog(S.LOG_INFO | S.LOG_MAIL, n + "; " + sender[5:] + ": lskip=%s" % gLskip)
        return "500 gibberish spam from %s" % sender
    return ''

------------------------------------------------------------------------------
_______________________________________________
courier-users mailing list
courier-users@lists.sourceforge.net
Unsubscribe: https://lists.sourceforge.net/lists/listinfo/courier-users

Reply via email to