I'd like everybody to run it in a daily cron job (along with your
mass-checks, if you're doing them).

http://www.chaosreigns.com/iprep/dl/iprep.pl

Works like:

./iprep.pl ham:dir:~/masscheckwork/ham spam:dir:~/masscheckwork/spam/

Where the arguments are the same as for mass-check.

Config file is ~/.ipreprc :
$trusted_networks = '';
$user = 'username';
$pass = 'password';

Email me for an account.  There's more detailed instructions in the
perl script (like argument definitions, for those not familiar with
mass-check targets.)

It uploads IP address and date of each ham and spam to my server via rsync.
(Everybody gets their own chroot jail, and I consider the data
confidential.)

I'm planning to aggregate the data and make it available as:

IP <percent ham> <count>

Where <count> is a logarithm of the total number of emails seen from that
IP.  And <percent ham> is normalized the same as the s/o value in ruleqa.
And old values will receive less weight then new values.  
(Maybe 0.99^(age in days) ?)

I kind of like the idea of only making the data available via rsync.  Seems
like it would reduce bandwidth usage, relative to serving via DNS?  


Next I'm planning to create a plugin to create tests to record values
(like iprep_ham_<percent>, iprep_count_<count>).  Then I can use them
to determine what tests would be most useful.

Output from my own corpora:
http://www.chaosreigns.com/iprep/iprep.txt


With 2618 hams, and 2956 spams, there were only *two* IP addresses that
were not 100% spam or 100% ham.  Both belong to google.

For IPv6, I'm thinking about aggregating at /48, just because that's what
he.net is letting me allocate.  That leaves 80 bits of addresses.  This is
an attempt to deal with a problem Warren worded well:  "IPv6 makes it
possible to send one spam per IPv6 address and never run out of IP
addresses".

-- 
"For every complex problem, there is a solution that is simple, neat,
and wrong." - H. L. Mencken
http://www.ChaosReigns.com

Reply via email to