-------- Original-Nachricht -------- > Datum: Mon, 03 Aug 2009 10:44:45 +0200 > Von: Tom Hendrikx <[email protected]> > An: Steve <[email protected]> > CC: Michael Watkins <[email protected]>, > [email protected] > Betreff: Re: [Dspam-user] RBL Configuration
> Steve schreef: > > -------- Original-Nachricht -------- > >> Datum: Fri, 31 Jul 2009 11:17:24 -0700 > >> Von: "Michael Watkins" <[email protected]> > >> An: [email protected] > >> Betreff: Re: [Dspam-user] RBL Configuration > > > > btw: Since you are using Geo-IP... I could extend the Geo-IP patch to > allow scoring by distance. I came to that idea after reading about SNARE > (http://www.technologyreview.com/communications/23086/). It is actually pretty > easy to do the calculation of the distance. Just out of curiosity I coded > quickly a Perl script using Geo::IP to extract the latitude and longitude of > your host (solutionroute.ca) and the same info for www.sourceforge.net and > then display some info (so I just know that I did it right in Geo::IP) and > then compute the distance in Kilometers. This is the result: > > ------- > > Info for www.sourceforge.net > > Country Code: US > > Country Code3: USA > > Country Name: United States > > Region: CA > > Region (Name): California > > City: Mountain View > > Postal Code: 94041 > > Latitude: 37.3885 > > Longitude: -122.0741 > > Time Zone: America/Los_Angeles > > Area Code: 650 > > Continent Code: NA > > Continent Name: North America > > Metro Code: 807 > > > > Info for solutionroute.ca > > Country Code: US > > Country Code3: USA > > Country Name: United States > > Region: MO > > Region (Name): Missouri > > City: Kansas City > > Postal Code: 64106 > > Latitude: 39.1068 > > Longitude: -94.5660 > > Time Zone: America/Chicago > > Area Code: 816 > > Continent Code: NA > > Continent Name: North America > > Metro Code: 616 > > > > Distance in Km: 2400.5724862323 > > ------- > > > > I used the free available GeoLiteCity.dat > (http://geolite.maxmind.com/download/geoip/database/) to get the extended > data. > > > > I have not added that jet to policyd-weight but I am really tempted to > add it. What I don't know jet is how to make the lookup table? The problem I > see with the lookup table is that I just have the distance and I need to > score if a certain distance is reached but look at this example: > > ---- > > @distance_score = ( > > # DISTANCE IN KM, NO MATCH, MATCH, LOG NAME > > "1000", -0.50, 0.50, "1000_KM", > > "2000", -0.50, 1.00, "2000_KM", > > "4000", -0.50, 1.50, "4000_KM", > > "8000", -0.50, 2.00, "8000_KM", > > "16000", -0.50, 2.50, "16000_KM", > > ); > > ---- > > > > I think this is a bit far-sought. This expects the sysadmin to do some > advanced math depending on his location. > What? It does not expect the system administrator to do advanced math. The module it self would do the math. It just needs to know what longitude and latitude to use as the starting point (for you that would probably be somewhere in NL). And every good system administrator should now how to get his longitude and latitude. That's no rocked science (every system administrator had sure once seen or used LOC in DNS). > For Europa, major spam > locations (f.i. BR, US, CN) are "far away", but when you're in US, the > table above does not work. Recalculation of the table is then based upon > already known data: known spam countries. > Please read the SNARE study mentioned above. Especially this part here: ----- Furthermore, the researchers found that by plotting the geodesic distance between the Internet Protocol (IP) addresses of the sender and receiver--measured on the curved surface of the earth--they could determine whether the message was junk. ----- Beside the conclusion from the SNARE study... No one forces you (if you are from the States any other country) to use the distance computation module. It is up to you if you want to use it or not. If you are indeed from the States then probably using ASN lookups and country lookups is better for you. > Classifying on country is cheaper in cpu-time, and enables you to > actually target known sources. > The computation of the distance is cheep in CPU time. It is just a bunch of acos, cos, sin, etc calls and a bunch of multiplications and additions and subtractions. In fact it is just one line in Perl code. > For example: > - get the sender's ISP name from whois > - get results for google image search "ISPname sysadmin" > - use face recognition > - measure sysadmins beard length > - more beard -> lower spam score > > Just a random example to show that you can do really cool stuff with > statistics (use your imagination!), but without much actual use. Without > a doubt, it is a cool idea to compute real life distance, but I think > that relevance vs. efficiency is a bit off :) > You are pulling hairs on a imaginary example. Where is that study showing that the above computations are relevant in SPAM fighting? I did not come up by myself with the idea of using the real life distance. The study mentioned above analyzed 25 Millions SPAM messages and the real life distance is one of the metrics they used in their study. So that's the reason I came up in using that in policyd-weight. It is just one of the informations one could use in a weighted system like policyd-weight. Much like the info you get from SenderBase. For example one could use the info from your two MX servers to get the longitude and latitude and much much more and use that info for doing some really nice decisions: ---- nyx ~ # dig +short in txt 147.194.149.217.test.senderbase.org "0-0=1|1=InterNLnet B.V. Nijmegen|2=4.4|3=4.8|4=1092588|6=1097801621|7=18|8=896|9=12|20=meredith.|21=tomhendrikx.nl|22=Y|24=1.5|25=1225401180|41=1.5|43=0.5|44=0.90|45=N|46=22|48=24|49=1.00|50=Nijmegen|51=03|53=NL|54=5.8667|55=51.8333" nyx ~ # ---- nyx ~ # dig +short in txt 76.31.215.82.test.senderbase.org "0-0=1|1=InterNLnet B.V.|2=5.0|3=5.1|4=1092591|6=1097803824|7=5|8=57968|9=97|41=1.0|43=0.2|44=0.60|45=N|46=18|48=24|49=1.00|50=Utrecht|51=09|53=NL|54=5.1333|55=52.0833" nyx ~ # ---- You can read up informations how Cisco is using that information from SenderBase in their IronPort products to help them compute a reputation score. And getting that kind of data is ultra cheep. Just one singe DNS lookup. No need to go on and do crazy face recognition and such. But hey! I just tried to be helpful by offering to include the computation of real distance in policyd-weight. I did not said that the computation would be the next big thing preventing SPAM on your system. Okay. Since this is DSPAM mailing list I should stop posting about other solutions here... > -- > Regards, > Tom > // Steve -- GRATIS für alle GMX-Mitglieder: Die maxdome Movie-FLAT! Jetzt freischalten unter http://portal.gmx.net/de/go/maxdome01 ------------------------------------------------------------------------------ Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july _______________________________________________ Dspam-user mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/dspam-user
