Re: [Dspam-user] RBL Configuration

Steve Mon, 03 Aug 2009 15:10:31 -0700

-------- Original-Nachricht --------
> Datum: Mon, 3 Aug 2009 08:53:09 -0700
> Von: "Michael Watkins" <[email protected]>
> An: [email protected]
> Betreff: Re: [Dspam-user] RBL Configuration

> On Mon, August 3, 2009 01:51, Steve wrote:
> > So a distance of 2500 Km would result in:
> > + 0.50
> > + 1.00
> > ------
> > + 1.50
> >
> > A distance of 1500 Km would result in:
> > + 0.50
> > - 1.00
> > ------
> > - 0.50
> >
> > Ach! Reading that now already tells me that most users would not be able
> to handle that. Better to go with the commutative approach. Or what do
> you think?
> 
> I tend to agree - it would probably raise more questions than users.
> 
> > So you think I should take the time to quickly add the scoring by
> distance and scoring by continent into policyd-weight?
> 
> Some admins might employ both, but I imagine that scoring by continent
> will be adopted by more than scoring by distance.
> 
> Of course some cases will not fall neatly into either. I see a fair amount
> of spam attempts from Central American nations, most of which fall within
> the distance span of covered by most points in Canada to most points
> within the US, so distance filtering is less helpful, and Central America
> isn't a continent but a region of North America.
> 
> Lets say I was providing mail services for diverse users across Canada and
> the U.S.; I likely would not use distance filtering to deal with South
> America (hello Brazil, you shimmering sun-baked source of tons of spam)
> but continent filtering certainly would save me from maintaining a lengthy
> list of country ranks including CL, AR, PE, UY and others.
> 
> In my case I know I would quickly implement continent support and might
> later dabble with distance but suspect that continent and countries would
> be most helpful.
> 
I just implemented that Geo::IP stuff (country, distance and continent 
lookup/matching) into policyd-weight. Will look how well it will help reducing 
SPAM and how well it will help reduce FP.

Did a quick test on a end of June installed mail cluster where DSPAM 3.9.0 is 
running. All data where new so probably there is a lot of FP/FN. Anyway... used 
the result from DSPAM and computed the distance for HAM and SPAM in the month 
of June. This is the result:

MX1 HAM       MX2 HAM
<100: 1994    <100: 1769
<6000: 1      <6000: 1
>=6000: 1702  >=6000: 1629
>=8000: 1501  >=8000: 1450
>=10000: 30   >=10000: 21

MX1 SPAM      MX2 SPAM
<100: 80      <100: 56
<6000: 1      <6000: 1
>=6000: 1450  >=6000: 1151
>=8000: 1270  >=8000: 1013
>=10000: 76   >=10000: 37

The values are cumulative. 

And the same for the month of August:
MX1 HAM       MX2 HAM
<100: 107     <100: 105
<6000: 1      <6000: 1
>=6000: 201   >=6000: 167
>=8000: 178   >=8000: 151
>=10000: 1    >=10000: 1

MX1 SPAM      MX2 SPAM
<100: 7       <100: 9
<6000: 1      <6000: 1
>=6000: 105   >=6000: 100
>=8000: 93    >=8000: 81
>=10000: 11   >=10000: 7

August shows better distribution of HAM/SPAM on the edge then the month June. 
Probably because users are training and accuracy is getting better so there are 
less FP/FN to pollute the distance lookup I did based on the status DSPAM 
tracked in the log.

I would say that the data is significant enough to be useful. Just looking at 
the month August shows me that from the 422 mails on MX1 just one HAM message 
was farer away then 10'000 KM and 11 SPAM messages where farer away then 10'000 
KM. So the significance is not in the 1/10 percent area but in the full digit 
percent area. This is significant enough for me.

The computation was ultra fast. I am surprised how fast it went. And I did 
double lookup (my IP and the remote IP). Inside policyd-weight I don't lookup 
my IP and I don't compute PI but have it fixed to around 30 digits. So that 
should give an extra speed boost. But to be honest: It is already very fast. 
Definitely much faster then doing DNS lookups and for sure faster then DSPAM.

I definitely will try to use that data to leverage the weightening.

Currently I have put the code at the absolute end of processing in 
policyd-weight. So it will not fire up every time if there is enough of other 
scores. I might consider adding it to an earlier state since the result of the 
distance would be then added to the mail headers and DSPAM might make a good 
use of that info when processing the message.

I just have to play with it and look how well I could use that info.

Thanks Michael for reminding me again about Geo-IP. Without you I would not 
have looked into that again.

// Steve
-- 
Jetzt kostenlos herunterladen: Internet Explorer 8 und Mozilla Firefox 3 -
sicherer, schneller und einfacher! http://portal.gmx.net/de/go/chbrowser

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Dspam-user mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspam-user

Re: [Dspam-user] RBL Configuration

Reply via email to