Hi all, Not sure if this is more appropriate here or on the users list.. anyway - here goes.
For some time now I've been capturing messages from dynamic ranges on networks with bot infestations and feeding these as fresh spam to sa-learn every 60 seconds if bayes has learnt spam < ham. This has been working rather well; particularly for 419/Lottery spam plus the usual spew. Anyway - I just noticed that the file size of a lot of these spams are very similar: -rw-rw---- 1 smtpf smtpf 2054 Aug 3 12:18 l72CHw2505731641Hk.trap -rw-rw---- 1 smtpf smtpf 718 Aug 3 12:18 l72CI12505731666rm.trap -rw-rw---- 1 smtpf smtpf 461 Aug 3 12:18 l72CI22505731688l1.trap -rw-rw---- 1 smtpf smtpf 506 Aug 3 12:18 l72CI22505731695sU.trap -rw-rw---- 1 smtpf smtpf 2055 Aug 3 12:18 l72CI42505731709VA.trap -rw-rw---- 1 smtpf smtpf 2027 Aug 3 12:18 l72CI62505731690kY.trap -rw-rw---- 1 smtpf smtpf 2066 Aug 3 12:18 l72CI62505731718gr.trap -rw-rw---- 1 smtpf smtpf 2069 Aug 3 12:18 l72CI62505731720bO.trap -rw-rw---- 1 smtpf smtpf 478 Aug 3 12:18 l72CI725057317172l.trap -rw-rw---- 1 smtpf smtpf 2060 Aug 3 12:18 l72CI725057317285c.trap -rw-rw---- 1 smtpf smtpf 439 Aug 3 12:18 l72CI92505731743cV.trap -rw-rw---- 1 smtpf smtpf 2057 Aug 3 12:18 l72CI925057317478s.trap -rw-rw---- 1 smtpf smtpf 439 Aug 3 12:18 l72CIa2505731954DQ.trap -rw-rw---- 1 smtpf smtpf 2068 Aug 3 12:18 l72CID25057317704v.trap -rw-rw---- 1 smtpf smtpf 488 Aug 3 12:18 l72CID2505731771OK.trap So that got me thinking that I'd create a plug-in to add the size of the message (in bytes) rounded to the nearest 100 bytes and add it into a TAG so I could add it as an extra header and get Bayes to use it for extra metadata. This all works correctly and I correctly see the 'X-Spam-Size: 2100' header in my test message; however Bayes never seems to use the header for a token (I'm using the CollectTokens plug-in to check). Is this expected behaviour? Or am I doing something wrong? The plug-in adds the tag using the parsed_metadata hook. Cheers, Steve.
