Hi all, >From what I understand/was told, this happens once a day ( or relatively infrequently), and you wanna avoid searching through all the geo data per ip ( since you are grouping the requests by IP).
IF that's the case, it would be better to use a separate DB table to cache these data ( IP, geoID ..etc) with the IP being the primary key ( which would improve the lookup time), and even though there will be cache misses, it would eventually reduce the (#cacheMisses/ Hits). Having a DB cache would be better since you do want to persist these data to be used over time. BTW in a cache miss, if we can figure out a way to limit the search range on the original table or at least stop the search once a match is found, it would greatly improve the cache miss time as well. That's my two cents. Cheers, Sachith On Sun, Mar 6, 2016 at 8:24 AM, Janaka Ranabahu <[email protected]> wrote: > Hi Sanjeewa, > > On Sun, Mar 6, 2016 at 7:25 AM, Sanjeewa Malalgoda <[email protected]> > wrote: > >> Implementing cache is better than having another table mapping IMO. What >> if we query database and keep IP range and network name in memory. >> Then we may do quick search on network name and then based on that rest >> can load some other way. >> WDYT? >> > We thought of having an in memory cache but we faced several issues along > the way. Let me explain the situation as it is per now. > > The Max-Mind DB has the IP addresses with the IP and the netmask. > Ex: 192.168.0.0/20 > > The calculation of the IP address range would be like the following. > > Address: 192.168.0.1 11000000.10101000.0000 0000.00000001 > Netmask: 255.255.240.0 = 20 11111111.11111111.1111 0000.00000000 > Wildcard: 0.0.15.255 00000000.00000000.0000 1111.11111111 > =>Network: 192.168.0.0/20 11000000.10101000.0000 0000.00000000 > (Class C) > Broadcast: 192.168.15.255 11000000.10101000.0000 1111.11111111 > HostMin: 192.168.0.1 11000000.10101000.0000 0000.00000001 > HostMax: 192.168.15.254 11000000.10101000.0000 1111.11111110 > Hosts/Net: 4094 (Private Internet > <http://www.ietf.org/rfc/rfc1918.txt>) > > > Therefore what we are currently doing is to calculate the start and end IP > for all the values in the max-mind DB and alter the tables with those > values initially(this is a one time thing that will happen). When the Spark > script executes, we check whether the given IP is between any of the start > and end ranges in the tables. That is the reason why it is taking a long > time to fetch results for a given IP. > > As a solution for this, we discussed what Tharindu has mentioned. > 1. Have a in memory caching mechanism. > 2. Have a DB based caching mechanism. > > The only point that we have to highlight is the fact that in both the > above mechanisms we need to cache the IP address(not the ip-netmask as it > was in the max-mind db) against the Geo location. > > Ex:- > For 192.168.0.1 - Colombo, Sri Lanka > For 192.168.15.254 - Colombo, Sri Lanka > > So as per the above example I took, if there are requests form all the > possible 4094 address we will be caching each IP with the Geo > location(since introducing range queries in a cache is not a good practice). > > Please find my comments about both the approaches. > > 1. Having an in-memory cache would speedup things but based on the IPs in > the data set, there could be number of entries for IPs in the same range. > One problem with this approach is that, if there is a server restart, the > initial script execution would take a lots of time. Also based on certain > scenarios(high number of different IPs) the cache would not have a > significant effect on script execution performance. > > 2. Having a DB based cache would persist the data even on a restart and > the data fetching query would be searching for an specific value(not a > range query as against the max-mind DB). But the downside is that for a > cache miss there would be minimum 3 DB queries (one for the cache table > lookup and one for the max-mind db lookup and one for the > cache persistence). > > That is why we have initiated this thread to finalize the caching approach > we should take. > > Thanks, > Janaka > > > >> Thanks, >> sanjeewa. >> >> On Fri, Mar 4, 2016 at 3:12 PM, Tharindu Dharmarathna <[email protected] >> > wrote: >> >>> Hi All, >>> >>> We are going to implement Client IP based Geo-location Graph in API >>> Manager Analytics. When we go through the ways of doing in [1] , we >>> selected [2] as the most suitable way to do. >>> >>> >>> *Overview of max-mind's DB.* >>> >>> As the structure of the db (attached in image), They have two tables >>> which incorporate to get the location. >>> >>> Find geoname_id according to network and get Country,City from locations >>> table. >>> >>> *Limitations* >>> >>> As their database dump we couldn't directly process the ip from those >>> tables. We need to check the given ip is in between the network min and max >>> ip. This query get some long time (10 seconds in indexed data). If we >>> directly do this from spark script for each and every ip which in summary >>> table (regardless if ip is same from two row data) will query from the >>> tables. Therefore this will incur the performance impact on this graph. >>> >>> *Solution* >>> >>> 1. Implement LRU cache against ip address vs location. >>> >>> This will need to implement on custom UDF in Spark. If ip querying from >>> spark available in cache it will give the location from it , IF it is not >>> It will retrieve from DB and put into the cache. >>> >>> 2. Persist in a Table >>> >>> ip as the primary key and Country and city as other columns and retrieve >>> data from that table. >>> >>> >>> Please feel free to give us the most suitable way of doing this >>> solution?. >>> >>> [1] - Implementing Geographical based Analytics in API Manager mail >>> thread. >>> >>> [2] - http://dev.maxmind.com/geoip/geoip2/geolite2/ >>> >>> >>> *Thanks* >>> >>> *Tharindu Dharmarathna* >>> Associate Software Engineer >>> WSO2 Inc.; http://wso2.com >>> lean.enterprise.middleware >>> >>> mobile: *+94779109091 <%2B94779109091>* >>> >> >> >> >> -- >> >> *Sanjeewa Malalgoda* >> WSO2 Inc. >> Mobile : +94713068779 >> >> <http://sanjeewamalalgoda.blogspot.com/>blog >> :http://sanjeewamalalgoda.blogspot.com/ >> <http://sanjeewamalalgoda.blogspot.com/> >> >> >> > > > -- > *Janaka Ranabahu* > Associate Technical Lead, WSO2 Inc. > http://wso2.com > > > *E-mail: [email protected] <http://wso2.com>**M: **+94 718370861 > <%2B94%20718370861>* > > Lean . Enterprise . Middleware > -- Sachith Withana Software Engineer; WSO2 Inc.; http://wso2.com E-mail: sachith AT wso2.com M: +94715518127 Linked-In: <http://goog_416592669>https://lk.linkedin.com/in/sachithwithana
_______________________________________________ Architecture mailing list [email protected] https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
