Hi Tharindu,

Great work. Can we do a performance test of this and share the results.
Basically what we need to check is to see how much time a script would take
to execute.

Thanks,
Janaka

On Thu, Mar 17, 2016 at 2:37 PM, Tharindu Dharmarathna <[email protected]>
wrote:

> Hi All,
>
> After Going though above discussion , We had implemented the Plug-gable
> User Define Extension point. From this configuration We can write our own
> implementation which can used to get the Country and State of the Given IP.
>
> *Caching Implementation*
>
> We define two level of caching as below.
>
> When IP address checked from the *UDF* , First It check on Cache to get
> the Location Information. If it is not in cache  It I'll check on another
> database which contain IP to Location Direct Mapping as *Sajith*
> Mentioned. If it is there it will return and cache that location. If
> location not in that database , IP will check against the *MAXMIND*
> database. and store the location on cache and the above table.
>
> Thanks
> Tharindu
>
>
> On Tue, Mar 8, 2016 at 2:34 PM, Tharindu Dharmarathna <[email protected]>
> wrote:
>
>> Hi All,
>>
>> We have come across following ways to do the above task after the Initial
>> POC.
>>
>> 1. Using File type database which given by max-mind (.mmdb) and use there
>> database readers.
>>
>> From this approach we got lesser value to get the location from the above
>> using JAX-RS service which is used to wrap the above database. This JAX-RS
>> implementation is by default used the max-mind's Cache implementation which
>> can find from [1] .
>>
>> *Limitations*
>>
>>
>>    - Hosting of the Jax-RS app in another server.
>>    - # of http calls will high.
>>
>>
>> 2. Call query server as above thread and cached the location with ip.
>>
>> Here you can find the execution time for a single query which get for
>> each method.
>>
>>
>> *Method 1 : 4.5 seconds*
>>
>> *Method 2: 4.76 seconds*
>>
>>
>> Thanks
>> Tharindu
>>
>>
>> On Tue, Mar 8, 2016 at 8:29 AM, Lasantha Fernando <[email protected]>
>> wrote:
>>
>>> Hi Tharindu,
>>>
>>> On 7 March 2016 at 21:10, Sajith Ravindra <[email protected]> wrote:
>>>
>>>>
>>>> 2. Having a DB based cache would persist the data even on a restart and
>>>>> the data fetching query would be searching for an specific value(not a
>>>>> range query as against the max-mind DB). But the downside is that for a
>>>>> cache miss there would be minimum 3 DB queries (one for the cache table
>>>>> lookup and one for the max-mind db lookup and one for the
>>>>> cache persistence).
>>>>>
>>>>
>>>> In order to avoid expensive cache misses we may eagerly populate the DB
>>>> table cache. i.e. When there's a cache miss we do the lookup in max-mind db
>>>> and then add multiple entries for multiple IPs of that netwokrk_cid to the
>>>> Cache DB table instead of only for that particular IP. That way we reduce
>>>> the chance of cache miss being very expensive, as we increase the chance of
>>>> it being found on the first DB lookup.
>>>>
>>>> We might need to do some evaluation to determine how much entries that
>>>> we are going to add to the DB cache for IP belongs to a  particular
>>>> netwokrk_cid. For an example if requests from a certain netwokrk_cidr is
>>>> frequent we may want to add more entries with compared to a less frequent
>>>> netwokrk_cidr.
>>>>
>>>> The downside is the DB cache tend to be more big.
>>>>
>>>> Thanks
>>>> *,Sajith Ravindra*
>>>> Senior Software Engineer
>>>> WSO2 Inc.; http://wso2.com
>>>> lean.enterprise.middleware
>>>>
>>>> mobile: +94 77 2273550
>>>> blog: http://sajithr.blogspot.com/
>>>> <http://lk.linkedin.com/pub/shani-ranasinghe/34/111/ab>
>>>>
>>>> On Mon, Mar 7, 2016 at 4:37 AM, Tharindu Dharmarathna <
>>>> [email protected]> wrote:
>>>>
>>>>> Hi Lasantha,
>>>>>
>>>>> Upto now we are doing the following way in order to get the geo
>>>>> location from the stated dump.
>>>>>
>>>>> 1.  two columns added filled with long value of lower and upper value
>>>>> of network ip addresses. Then get the geoname_id with respect to the long
>>>>> value for the given ip which between this above long values. Hope you will
>>>>> got this idea on our approach. Is there any way to do bit wise operation 
>>>>> in
>>>>> order to get the network_cidr value ? .
>>>>>
>>>>
>>> Can't we do it by keeping the network IP and the subnet as two columns
>>> and the geoname_id as the third. Say for example, if 192.168.0.0/20 is
>>> the cidr, for IPv4 routing what is usually done is we get the IP as int,
>>> then do a bitwise AND with the subnet mask (e.g. if subnet mask is 20, that
>>> would mean 20 bits with value 1 and remaining 12 bits of value 0, i.e.
>>> 11111111 11111111 11110000 00000) and check whether that returns the
>>> network IP.
>>>
>>> You might find more info here [1]. I think there should be libraries
>>> that wrap this operation. But if performance is a concern and we need to
>>> keep the cache search implementation very lean, we can implement it
>>> ourselves.
>>>
>>> WDYT?
>>>
>>> [1]
>>> http://stackoverflow.com/questions/4209760/validate-an-ip-address-with-mask
>>>
>>> Thanks,
>>> Lasantha
>>>
>>>
>>>>> Thanks
>>>>> Tharindu
>>>>>
>>>>> On Mon, Mar 7, 2016 at 12:05 AM, Lasantha Fernando <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> I think what Sachith suggests also makes sense. But am also rooting
>>>>>> for the in-memory cache implementation suggested by Sanjeewa with
>>>>>> ip-netmask approach.
>>>>>>
>>>>>> Please find my comments inline.
>>>>>>
>>>>>> On 5 March 2016 at 23:50, Sachith Withana <[email protected]> wrote:
>>>>>>
>>>>>>> Hi all,
>>>>>>>
>>>>>>> From what I understand/was told, this happens once a day ( or
>>>>>>> relatively infrequently), and you wanna avoid searching through all the 
>>>>>>> geo
>>>>>>> data per ip ( since you are grouping the requests by IP).
>>>>>>>
>>>>>>> IF that's the case, it would be better to use a separate DB table to
>>>>>>> cache these data ( IP, geoID ..etc) with the IP being the primary key (
>>>>>>> which would improve the lookup time), and even though there will be 
>>>>>>> cache
>>>>>>> misses, it would eventually reduce the (#cacheMisses/ Hits).
>>>>>>>
>>>>>>> Having a DB cache would be better since you do want to persist these
>>>>>>> data to be used over time.
>>>>>>>
>>>>>>> BTW in a cache miss, if we can figure out a way to limit the search
>>>>>>> range on the original table or at least stop the search once a match is
>>>>>>> found, it would greatly improve the cache miss time as well.
>>>>>>>
>>>>>>> That's my two cents.
>>>>>>>
>>>>>>> Cheers,
>>>>>>> Sachith
>>>>>>>
>>>>>>> On Sun, Mar 6, 2016 at 8:24 AM, Janaka Ranabahu <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi Sanjeewa,
>>>>>>>>
>>>>>>>> On Sun, Mar 6, 2016 at 7:25 AM, Sanjeewa Malalgoda <
>>>>>>>> [email protected]> wrote:
>>>>>>>>
>>>>>>>>> Implementing cache is better than having another table mapping
>>>>>>>>> IMO. What if we query database and keep IP range and network name in 
>>>>>>>>> memory.
>>>>>>>>> Then we may do quick search on network name and then based on that
>>>>>>>>> rest can load some other way.
>>>>>>>>> WDYT?
>>>>>>>>>
>>>>>>>> ​We thought of having an in memory cache but we faced several
>>>>>>>> issues along the way. Let me explain the situation as it is per now.​
>>>>>>>>
>>>>>>>> The Max-Mind DB has the IP addresses with the IP and the netmask.
>>>>>>>> Ex: 192.168.0.0/20
>>>>>>>>
>>>>>>>> The calculation of the IP address range would be like the following.
>>>>>>>>
>>>>>>>> Address:   192.168.0.1           11000000.10101000.0000 0000.00000001
>>>>>>>> Netmask:   255.255.240.0 = 20    11111111.11111111.1111 0000.00000000
>>>>>>>> Wildcard:  0.0.15.255            00000000.00000000.0000 1111.11111111
>>>>>>>> =>Network:   192.168.0.0/20        11000000.10101000.0000 
>>>>>>>> 0000.00000000 (Class C)
>>>>>>>> Broadcast: 192.168.15.255        11000000.10101000.0000 1111.11111111
>>>>>>>> HostMin:   192.168.0.1           11000000.10101000.0000 0000.00000001
>>>>>>>> HostMax:   192.168.15.254        11000000.10101000.0000 1111.11111110
>>>>>>>> Hosts/Net: 4094                  (Private Internet 
>>>>>>>> <http://www.ietf.org/rfc/rfc1918.txt>)
>>>>>>>>
>>>>>>>>
>>>>>>>> Therefore what we are currently doing is to calculate the start and
>>>>>>>> end IP for all the values in the max-mind DB and alter the tables with
>>>>>>>> those values initially(this is a one time thing that will happen). 
>>>>>>>> When the
>>>>>>>> Spark script executes, we check whether the given IP is between any of 
>>>>>>>> the
>>>>>>>> start and end ranges in the tables. That is the reason why it is 
>>>>>>>> taking a
>>>>>>>> long time to fetch results for a given IP.
>>>>>>>>
>>>>>>>> As a solution for this, we discussed what Tharindu has mentioned.
>>>>>>>> 1. Have a in memory caching mechanism.
>>>>>>>> 2. Have a DB based caching mechanism.
>>>>>>>>
>>>>>>>> The only point that we have to highlight is the fact that in both
>>>>>>>> the above mechanisms we need to cache the IP address(not the 
>>>>>>>> ip-netmask as
>>>>>>>> it was in the max-mind db) against the Geo location.
>>>>>>>>
>>>>>>>> Ex:-
>>>>>>>> For 192.168.0.1       - Colombo, Sri Lanka
>>>>>>>> For 192.168.15.254 - Colombo, Sri Lanka
>>>>>>>>
>>>>>>>> So as per the above example I took, if there are requests form all
>>>>>>>> the possible 4094 address we will be caching each IP with the Geo
>>>>>>>> location(since introducing range queries in a cache is not a good 
>>>>>>>> practice).
>>>>>>>>
>>>>>>>
>>>>>> Since we are implementing a custom cache, won't we be doing a bitwise
>>>>>> operation for the lookup with netmask and network IP? So basically, we
>>>>>> would keep the network IP and the netmask in cache and simply do a 
>>>>>> bitwise
>>>>>> AND to determine whether it is a match or not, right? Am thinking such an
>>>>>> operation would not incur much of a performance hit and it won't be as
>>>>>> prohibitive as a normal range query in a cache. If that is the case, I
>>>>>> think we can go with the approach suggested by Sanjeewa.
>>>>>>
>>>>>> WDYT?
>>>>>>
>>>>>>
>>>>>>>> Please find my comments about both the approaches.
>>>>>>>>
>>>>>>>> 1. Having an in-memory cache would speedup things but based on the
>>>>>>>> IPs in the data set, there could be number of entries for IPs in the 
>>>>>>>> same
>>>>>>>> range. One problem with this approach is that, if there is a server
>>>>>>>> restart, the initial script execution would take a lots of time. Also 
>>>>>>>> based
>>>>>>>> on certain scenarios(high number of different IPs) the cache would not 
>>>>>>>> have
>>>>>>>> a significant effect on script execution performance.
>>>>>>>>
>>>>>>>> 2. Having a DB based cache would persist the data even on a restart
>>>>>>>> and the data fetching query would be searching for an specific 
>>>>>>>> value(not a
>>>>>>>> range query as against the max-mind DB). But the downside is that for a
>>>>>>>> cache miss there would be minimum 3 DB queries (one for the cache table
>>>>>>>> lookup and one for the max-mind db lookup and one for the
>>>>>>>> cache persistence).
>>>>>>>>
>>>>>>>> That is why we have initiated this thread to finalize the caching
>>>>>>>> approach we should take.
>>>>>>>> ​
>>>>>>>> ​Thanks,
>>>>>>>> Janaka​
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> sanjeewa.
>>>>>>>>>
>>>>>>>>
>>>>>> Thanks,
>>>>>> Lasantha
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>>> On Fri, Mar 4, 2016 at 3:12 PM, Tharindu Dharmarathna <
>>>>>>>>> [email protected]> wrote:
>>>>>>>>>
>>>>>>>>>> Hi All,
>>>>>>>>>>
>>>>>>>>>> We are going to implement Client IP based Geo-location Graph in
>>>>>>>>>> API Manager Analytics. When we go through the ways of doing in [1] , 
>>>>>>>>>> we
>>>>>>>>>> selected [2] as the most suitable way to do.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> *Overview of max-mind's DB.*
>>>>>>>>>>
>>>>>>>>>> As the structure of the db (attached in image), They have two
>>>>>>>>>> tables which incorporate to get the location.
>>>>>>>>>>
>>>>>>>>>> Find geoname_id according to network and get Country,City from
>>>>>>>>>> locations table.
>>>>>>>>>>
>>>>>>>>>> *Limitations*
>>>>>>>>>>
>>>>>>>>>> As their database dump we couldn't directly process the ip from
>>>>>>>>>> those tables. We need to check the given ip is in between the 
>>>>>>>>>> network min
>>>>>>>>>> and max ip. This query get some long time (10 seconds in indexed 
>>>>>>>>>> data). If
>>>>>>>>>> we directly do this from spark script for each and every ip which in
>>>>>>>>>> summary table (regardless if ip is same from two row data) will 
>>>>>>>>>> query from
>>>>>>>>>> the tables. Therefore this will incur the performance impact on this 
>>>>>>>>>> graph.
>>>>>>>>>>
>>>>>>>>>> *Solution*
>>>>>>>>>>
>>>>>>>>>> 1. Implement LRU cache against ip address vs location.
>>>>>>>>>>
>>>>>>>>>> This will need to implement on custom UDF in Spark. If ip
>>>>>>>>>> querying from spark available in cache it will give the location 
>>>>>>>>>> from it ,
>>>>>>>>>> IF it is not It will retrieve from DB and put into the cache.
>>>>>>>>>>
>>>>>>>>>> 2. Persist in a Table
>>>>>>>>>>
>>>>>>>>>> ip as the primary key and Country and city as other columns and
>>>>>>>>>> retrieve data from that table.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Please feel free to give us the most suitable way of doing this
>>>>>>>>>> solution?.
>>>>>>>>>>
>>>>>>>>>> [1] - Implementing Geographical based Analytics in API Manager
>>>>>>>>>> mail thread.
>>>>>>>>>>
>>>>>>>>>> [2] - http://dev.maxmind.com/geoip/geoip2/geolite2/
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> *Thanks*
>>>>>>>>>>
>>>>>>>>>> *Tharindu Dharmarathna*
>>>>>>>>>> Associate Software Engineer
>>>>>>>>>> WSO2 Inc.; http://wso2.com
>>>>>>>>>> lean.enterprise.middleware
>>>>>>>>>>
>>>>>>>>>> mobile: *+94779109091 <%2B94779109091>*
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>>
>>>>>>>>> *Sanjeewa Malalgoda*
>>>>>>>>> WSO2 Inc.
>>>>>>>>> Mobile : +94713068779
>>>>>>>>>
>>>>>>>>> <http://sanjeewamalalgoda.blogspot.com/>blog
>>>>>>>>> :http://sanjeewamalalgoda.blogspot.com/
>>>>>>>>> <http://sanjeewamalalgoda.blogspot.com/>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> *Janaka Ranabahu*
>>>>>>>> Associate Technical Lead, WSO2 Inc.
>>>>>>>> http://wso2.com
>>>>>>>>
>>>>>>>>
>>>>>>>> *E-mail: [email protected] <http://wso2.com>**M: **+94 718370861
>>>>>>>> <%2B94%20718370861>*
>>>>>>>>
>>>>>>>> Lean . Enterprise . Middleware
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Sachith Withana
>>>>>>> Software Engineer; WSO2 Inc.; http://wso2.com
>>>>>>> E-mail: sachith AT wso2.com
>>>>>>> M: +94715518127
>>>>>>> Linked-In: <http://goog_416592669>
>>>>>>> https://lk.linkedin.com/in/sachithwithana
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Architecture mailing list
>>>>>>> [email protected]
>>>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> *Lasantha Fernando*
>>>>>> Senior Software Engineer - Data Technologies Team
>>>>>> WSO2 Inc. http://wso2.com
>>>>>>
>>>>>> email: [email protected]
>>>>>> mobile: (+94) 71 5247551
>>>>>>
>>>>>> _______________________________________________
>>>>>> Architecture mailing list
>>>>>> [email protected]
>>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> *Tharindu Dharmarathna*Associate Software Engineer
>>>>> WSO2 Inc.; http://wso2.com
>>>>> lean.enterprise.middleware
>>>>>
>>>>> mobile: *+94779109091 <%2B94779109091>*
>>>>>
>>>>> _______________________________________________
>>>>> Architecture mailing list
>>>>> [email protected]
>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> *Lasantha Fernando*
>>> Senior Software Engineer - Data Technologies Team
>>> WSO2 Inc. http://wso2.com
>>>
>>> email: [email protected]
>>> mobile: (+94) 71 5247551
>>>
>>> _______________________________________________
>>> Architecture mailing list
>>> [email protected]
>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>>
>>>
>>
>>
>> --
>>
>> *Tharindu Dharmarathna*Associate Software Engineer
>> WSO2 Inc.; http://wso2.com
>> lean.enterprise.middleware
>>
>> mobile: *+94779109091 <%2B94779109091>*
>>
>
>
>
> --
>
> *Tharindu Dharmarathna*Associate Software Engineer
> WSO2 Inc.; http://wso2.com
> lean.enterprise.middleware
>
> mobile: *+94779109091 <%2B94779109091>*
>



-- 
*Janaka Ranabahu*
Associate Technical Lead, WSO2 Inc.
http://wso2.com


*E-mail: [email protected] <http://wso2.com>**M: **+94 718370861*

Lean . Enterprise . Middleware
_______________________________________________
Architecture mailing list
[email protected]
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture

Reply via email to