-------- Original-Nachricht -------- > Datum: Mon, 3 Aug 2009 08:53:09 -0700 > Von: "Michael Watkins" <[email protected]> > An: [email protected] > Betreff: Re: [Dspam-user] RBL Configuration
> On Mon, August 3, 2009 01:51, Steve wrote: > > So a distance of 2500 Km would result in: > > + 0.50 > > + 1.00 > > ------ > > + 1.50 > > > > A distance of 1500 Km would result in: > > + 0.50 > > - 1.00 > > ------ > > - 0.50 > > > > Ach! Reading that now already tells me that most users would not be able > to handle that. Better to go with the commutative approach. Or what do > you think? > > I tend to agree - it would probably raise more questions than users. > > > So you think I should take the time to quickly add the scoring by > distance and scoring by continent into policyd-weight? > > Some admins might employ both, but I imagine that scoring by continent > will be adopted by more than scoring by distance. > > Of course some cases will not fall neatly into either. I see a fair amount > of spam attempts from Central American nations, most of which fall within > the distance span of covered by most points in Canada to most points > within the US, so distance filtering is less helpful, and Central America > isn't a continent but a region of North America. > > Lets say I was providing mail services for diverse users across Canada and > the U.S.; I likely would not use distance filtering to deal with South > America (hello Brazil, you shimmering sun-baked source of tons of spam) > but continent filtering certainly would save me from maintaining a lengthy > list of country ranks including CL, AR, PE, UY and others. > > In my case I know I would quickly implement continent support and might > later dabble with distance but suspect that continent and countries would > be most helpful. > I just implemented that Geo::IP stuff (country, distance and continent lookup/matching) into policyd-weight. Will look how well it will help reducing SPAM and how well it will help reduce FP. Did a quick test on a end of June installed mail cluster where DSPAM 3.9.0 is running. All data where new so probably there is a lot of FP/FN. Anyway... used the result from DSPAM and computed the distance for HAM and SPAM in the month of June. This is the result: MX1 HAM MX2 HAM <100: 1994 <100: 1769 <6000: 1 <6000: 1 >=6000: 1702 >=6000: 1629 >=8000: 1501 >=8000: 1450 >=10000: 30 >=10000: 21 MX1 SPAM MX2 SPAM <100: 80 <100: 56 <6000: 1 <6000: 1 >=6000: 1450 >=6000: 1151 >=8000: 1270 >=8000: 1013 >=10000: 76 >=10000: 37 The values are cumulative. And the same for the month of August: MX1 HAM MX2 HAM <100: 107 <100: 105 <6000: 1 <6000: 1 >=6000: 201 >=6000: 167 >=8000: 178 >=8000: 151 >=10000: 1 >=10000: 1 MX1 SPAM MX2 SPAM <100: 7 <100: 9 <6000: 1 <6000: 1 >=6000: 105 >=6000: 100 >=8000: 93 >=8000: 81 >=10000: 11 >=10000: 7 August shows better distribution of HAM/SPAM on the edge then the month June. Probably because users are training and accuracy is getting better so there are less FP/FN to pollute the distance lookup I did based on the status DSPAM tracked in the log. I would say that the data is significant enough to be useful. Just looking at the month August shows me that from the 422 mails on MX1 just one HAM message was farer away then 10'000 KM and 11 SPAM messages where farer away then 10'000 KM. So the significance is not in the 1/10 percent area but in the full digit percent area. This is significant enough for me. The computation was ultra fast. I am surprised how fast it went. And I did double lookup (my IP and the remote IP). Inside policyd-weight I don't lookup my IP and I don't compute PI but have it fixed to around 30 digits. So that should give an extra speed boost. But to be honest: It is already very fast. Definitely much faster then doing DNS lookups and for sure faster then DSPAM. I definitely will try to use that data to leverage the weightening. Currently I have put the code at the absolute end of processing in policyd-weight. So it will not fire up every time if there is enough of other scores. I might consider adding it to an earlier state since the result of the distance would be then added to the mail headers and DSPAM might make a good use of that info when processing the message. I just have to play with it and look how well I could use that info. Thanks Michael for reminding me again about Geo-IP. Without you I would not have looked into that again. // Steve -- Jetzt kostenlos herunterladen: Internet Explorer 8 und Mozilla Firefox 3 - sicherer, schneller und einfacher! http://portal.gmx.net/de/go/chbrowser ------------------------------------------------------------------------------ Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july _______________________________________________ Dspam-user mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/dspam-user
