Hi all,

Not sure if this is more appropriate here or on the users list.. anyway
- here goes.

For some time now I've been capturing messages from dynamic ranges on
networks with bot infestations and feeding these as fresh spam to
sa-learn every 60 seconds if bayes has learnt spam < ham.   This has
been working rather well; particularly for 419/Lottery spam plus the
usual spew.

Anyway - I just noticed that the file size of a lot of these spams are
very similar:

-rw-rw----  1 smtpf smtpf    2054 Aug  3 12:18 l72CHw2505731641Hk.trap
-rw-rw----  1 smtpf smtpf     718 Aug  3 12:18 l72CI12505731666rm.trap
-rw-rw----  1 smtpf smtpf     461 Aug  3 12:18 l72CI22505731688l1.trap
-rw-rw----  1 smtpf smtpf     506 Aug  3 12:18 l72CI22505731695sU.trap
-rw-rw----  1 smtpf smtpf    2055 Aug  3 12:18 l72CI42505731709VA.trap
-rw-rw----  1 smtpf smtpf    2027 Aug  3 12:18 l72CI62505731690kY.trap
-rw-rw----  1 smtpf smtpf    2066 Aug  3 12:18 l72CI62505731718gr.trap
-rw-rw----  1 smtpf smtpf    2069 Aug  3 12:18 l72CI62505731720bO.trap
-rw-rw----  1 smtpf smtpf     478 Aug  3 12:18 l72CI725057317172l.trap
-rw-rw----  1 smtpf smtpf    2060 Aug  3 12:18 l72CI725057317285c.trap
-rw-rw----  1 smtpf smtpf     439 Aug  3 12:18 l72CI92505731743cV.trap
-rw-rw----  1 smtpf smtpf    2057 Aug  3 12:18 l72CI925057317478s.trap
-rw-rw----  1 smtpf smtpf     439 Aug  3 12:18 l72CIa2505731954DQ.trap
-rw-rw----  1 smtpf smtpf    2068 Aug  3 12:18 l72CID25057317704v.trap
-rw-rw----  1 smtpf smtpf     488 Aug  3 12:18 l72CID2505731771OK.trap

So that got me thinking that I'd create a plug-in to add the size of the
message (in bytes) rounded to the nearest 100 bytes and add it into a
TAG so I could add it as an extra header and get Bayes to use it for
extra metadata.

This all works correctly and I correctly see the 'X-Spam-Size: 2100'
header in my test message; however Bayes never seems to use the header
for a token (I'm using the CollectTokens plug-in to check).

Is this expected behaviour?  Or am I doing something wrong?  The plug-in
 adds the tag using the parsed_metadata hook.

Cheers,
Steve.

Reply via email to