-------- Original-Nachricht --------
> Datum: Mon, 03 Aug 2009 10:44:45 +0200
> Von: Tom Hendrikx <[email protected]>
> An: Steve <[email protected]>
> CC: Michael Watkins <[email protected]>, 
> [email protected]
> Betreff: Re: [Dspam-user] RBL Configuration

> Steve schreef:
> > -------- Original-Nachricht --------
> >> Datum: Fri, 31 Jul 2009 11:17:24 -0700
> >> Von: "Michael Watkins" <[email protected]>
> >> An: [email protected]
> >> Betreff: Re: [Dspam-user] RBL Configuration
> > 
> > btw: Since you are using Geo-IP... I could extend the Geo-IP patch to
> allow scoring by distance. I came to that idea after reading about SNARE
> (http://www.technologyreview.com/communications/23086/). It is actually pretty
> easy to do the calculation of the distance. Just out of curiosity I coded
> quickly a Perl script using Geo::IP to extract the latitude and longitude of
> your host (solutionroute.ca) and the same info for www.sourceforge.net and
> then display some info (so I just know that I did it right in Geo::IP) and
> then compute the distance in Kilometers. This is the result:
> > -------
> > Info for www.sourceforge.net
> > Country Code:   US
> > Country Code3:  USA
> > Country Name:   United States
> > Region:         CA
> > Region (Name):  California
> > City:           Mountain View
> > Postal Code:    94041
> > Latitude:       37.3885
> > Longitude:      -122.0741
> > Time Zone:      America/Los_Angeles
> > Area Code:      650
> > Continent Code: NA
> > Continent Name: North America
> > Metro Code:     807
> > 
> > Info for solutionroute.ca
> > Country Code:   US
> > Country Code3:  USA
> > Country Name:   United States
> > Region:         MO
> > Region (Name):  Missouri
> > City:           Kansas City
> > Postal Code:    64106
> > Latitude:       39.1068
> > Longitude:      -94.5660
> > Time Zone:      America/Chicago
> > Area Code:      816
> > Continent Code: NA
> > Continent Name: North America
> > Metro Code:     616
> > 
> > Distance in Km: 2400.5724862323
> > -------
> > 
> > I used the free available GeoLiteCity.dat
> (http://geolite.maxmind.com/download/geoip/database/) to get the extended 
> data.
> > 
> > I have not added that jet to policyd-weight but I am really tempted to
> add it. What I don't know jet is how to make the lookup table? The problem I
> see with the lookup table is that I just have the distance and I need to
> score if a certain distance is reached but look at this example:
> > ----
> > @distance_score = (
> >   # DISTANCE IN KM,   NO MATCH,  MATCH,  LOG NAME
> >   "1000",             -0.50,     0.50,   "1000_KM",
> >   "2000",             -0.50,     1.00,   "2000_KM",
> >   "4000",             -0.50,     1.50,   "4000_KM",
> >   "8000",             -0.50,     2.00,   "8000_KM",
> >   "16000",            -0.50,     2.50,   "16000_KM",
> > );
> > ----
> > 
> 
> I think this is a bit far-sought. This expects the sysadmin to do some
> advanced math depending on his location.
>
What? It does not expect the system administrator to do advanced math. The 
module it self would do the math. It just needs to know what longitude and 
latitude to use as the starting point (for you that would probably be somewhere 
in NL). And every good system administrator should now how to get his longitude 
and latitude. That's no rocked science (every system administrator had sure 
once seen or used LOC in DNS).


> For Europa, major spam
> locations (f.i. BR, US, CN) are "far away", but when you're in US, the
> table above does not work. Recalculation of the table is then based upon
> already known data: known spam countries.
> 
Please read the SNARE study mentioned above. Especially this part here:
-----
Furthermore, the researchers found that by plotting the geodesic distance 
between the Internet Protocol (IP) addresses of the sender and 
receiver--measured on the curved surface of the earth--they could determine 
whether the message was junk.
-----

Beside the conclusion from the SNARE study... No one forces you (if you are 
from the States any other country) to use the distance computation module. It 
is up to you if you want to use it or not. If you are indeed from the States 
then probably using ASN lookups and country lookups is better for you.


> Classifying on country is cheaper in cpu-time, and enables you to
> actually target known sources.
> 
The computation of the distance is cheep in CPU time. It is just a bunch of 
acos, cos, sin, etc calls and a bunch of multiplications and additions and 
subtractions. In fact it is just one line in Perl code.


> For example:
> - get the sender's ISP name from whois
> - get results for google image search "ISPname sysadmin"
> - use face recognition
> - measure sysadmins beard length
> - more beard -> lower spam score
> 
> Just a random example to show that you can do really cool stuff with
> statistics (use your imagination!), but without much actual use. Without
> a doubt, it is a cool idea to compute real life distance, but I think
> that relevance vs. efficiency is a bit off :)
> 
You are pulling hairs on a imaginary example. Where is that study showing that 
the above computations are relevant in SPAM fighting?

I did not come up by myself with the idea of using the real life distance. The 
study mentioned above analyzed 25 Millions SPAM messages and the real life 
distance is one of the metrics they used in their study. So that's the reason I 
came up in using that in policyd-weight.

It is just one of the informations one could use in a weighted system like 
policyd-weight. Much like the info you get from SenderBase. For example one 
could use the info from your two MX servers to get the longitude and latitude 
and much much more and use that info for doing some really nice decisions:
----
nyx ~ # dig +short in txt 147.194.149.217.test.senderbase.org
"0-0=1|1=InterNLnet B.V. 
Nijmegen|2=4.4|3=4.8|4=1092588|6=1097801621|7=18|8=896|9=12|20=meredith.|21=tomhendrikx.nl|22=Y|24=1.5|25=1225401180|41=1.5|43=0.5|44=0.90|45=N|46=22|48=24|49=1.00|50=Nijmegen|51=03|53=NL|54=5.8667|55=51.8333"
nyx ~ #
----
nyx ~ # dig +short in txt 76.31.215.82.test.senderbase.org
"0-0=1|1=InterNLnet 
B.V.|2=5.0|3=5.1|4=1092591|6=1097803824|7=5|8=57968|9=97|41=1.0|43=0.2|44=0.60|45=N|46=18|48=24|49=1.00|50=Utrecht|51=09|53=NL|54=5.1333|55=52.0833"
nyx ~ #
----

You can read up informations how Cisco is using that information from 
SenderBase in their IronPort products to help them compute a reputation score. 

And getting that kind of data is ultra cheep. Just one singe DNS lookup. No 
need to go on and do crazy face recognition and such.

But hey! I just tried to be helpful by offering to include the computation of 
real distance in policyd-weight. I did not said that the computation would be 
the next big thing preventing SPAM on your system. Okay. Since this is DSPAM 
mailing list I should stop posting about other solutions here...


> --
> Regards,
>       Tom
> 
// Steve
-- 
GRATIS für alle GMX-Mitglieder: Die maxdome Movie-FLAT!
Jetzt freischalten unter http://portal.gmx.net/de/go/maxdome01

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Dspam-user mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspam-user

Reply via email to