On Mon, 23 Apr 2007, Vincent Fleming wrote: ; ; Can some of you on the list help out here and comment with your traffic ; patterns?
Quite happy to. It certainly looks like this approach is going to be useful. If you look at http://www.fiddaman.net/t.html you'll see a table generated from my last few hours' worth of mail - I've blocked out the last two octets of the IP addresses but otherwise it's as it comes out of the database. I've only shown IPs which have sent more than 50 messages and Norm5tot is sum(score - 5) Based on this small sample of data, it looks very clear cut between spam and non-spam sources - there may well be some scope for weighting in both directions in the same way that AWL does. I'm cautious, so I'm wondering about something like the following algorithm - numbers just starting points. Using a data window of the last 5 days, IF average score > 20 AND total score normalised around 5 > 500 AND Ham/Spam ratio < 0.1 Start randomly sampling email from the IP such that ~1 in 20 is passed to SA, the others just get assigned the average score. ENDIF That data table took just a few seconds to generate from the log data I record anyway so I can easily convert it into a lookup table for the milter and update it every hour or so. Of course, the SA approach would be to implement this as a plugin but I have to say that the idea of avoiding the SA overhead completely appeals. A plugin wouldn't be too difficult though, the existing AWL plugin has most of the code and structure required. A.
