Hi Kishanthan,

Started work on using the same UDF.

Thanks,
Lochana

On Thu, Mar 17, 2016 at 11:04 PM, Kishanthan Thangarajah <
[email protected]> wrote:

> Lochana, let's try to reuse this udf for AS analytics too.
>
> On Thu, Mar 17, 2016 at 2:43 PM, Janaka Ranabahu <[email protected]> wrote:
>
>> Hi Tharindu,
>>
>> Great work. Can we do a performance test of this and share the results.
>> Basically what we need to check is to see how much time a script would take
>> to execute.
>>
>> Thanks,
>> Janaka
>>
>> On Thu, Mar 17, 2016 at 2:37 PM, Tharindu Dharmarathna <
>> [email protected]> wrote:
>>
>>> Hi All,
>>>
>>> After Going though above discussion , We had implemented the Plug-gable
>>> User Define Extension point. From this configuration We can write our own
>>> implementation which can used to get the Country and State of the Given IP.
>>>
>>> *Caching Implementation*
>>>
>>> We define two level of caching as below.
>>>
>>> When IP address checked from the *UDF* , First It check on Cache to get
>>> the Location Information. If it is not in cache  It I'll check on another
>>> database which contain IP to Location Direct Mapping as *Sajith*
>>> Mentioned. If it is there it will return and cache that location. If
>>> location not in that database , IP will check against the *MAXMIND*
>>> database. and store the location on cache and the above table.
>>>
>>> Thanks
>>> Tharindu
>>>
>>>
>>> On Tue, Mar 8, 2016 at 2:34 PM, Tharindu Dharmarathna <
>>> [email protected]> wrote:
>>>
>>>> Hi All,
>>>>
>>>> We have come across following ways to do the above task after the
>>>> Initial POC.
>>>>
>>>> 1. Using File type database which given by max-mind (.mmdb) and use
>>>> there database readers.
>>>>
>>>> From this approach we got lesser value to get the location from the
>>>> above using JAX-RS service which is used to wrap the above database. This
>>>> JAX-RS implementation is by default used the max-mind's Cache
>>>> implementation which can find from [1] .
>>>>
>>>> *Limitations*
>>>>
>>>>
>>>>    - Hosting of the Jax-RS app in another server.
>>>>    - # of http calls will high.
>>>>
>>>>
>>>> 2. Call query server as above thread and cached the location with ip.
>>>>
>>>> Here you can find the execution time for a single query which get for
>>>> each method.
>>>>
>>>>
>>>> *Method 1 : 4.5 seconds*
>>>>
>>>> *Method 2: 4.76 seconds*
>>>>
>>>>
>>>> Thanks
>>>> Tharindu
>>>>
>>>>
>>>> On Tue, Mar 8, 2016 at 8:29 AM, Lasantha Fernando <[email protected]>
>>>> wrote:
>>>>
>>>>> Hi Tharindu,
>>>>>
>>>>> On 7 March 2016 at 21:10, Sajith Ravindra <[email protected]> wrote:
>>>>>
>>>>>>
>>>>>> 2. Having a DB based cache would persist the data even on a restart
>>>>>>> and the data fetching query would be searching for an specific 
>>>>>>> value(not a
>>>>>>> range query as against the max-mind DB). But the downside is that for a
>>>>>>> cache miss there would be minimum 3 DB queries (one for the cache table
>>>>>>> lookup and one for the max-mind db lookup and one for the
>>>>>>> cache persistence).
>>>>>>>
>>>>>>
>>>>>> In order to avoid expensive cache misses we may eagerly populate the
>>>>>> DB table cache. i.e. When there's a cache miss we do the lookup in 
>>>>>> max-mind
>>>>>> db and then add multiple entries for multiple IPs of that netwokrk_cid to
>>>>>> the Cache DB table instead of only for that particular IP. That way we
>>>>>> reduce the chance of cache miss being very expensive, as we increase the
>>>>>> chance of it being found on the first DB lookup.
>>>>>>
>>>>>> We might need to do some evaluation to determine how much entries
>>>>>> that we are going to add to the DB cache for IP belongs to a  particular
>>>>>> netwokrk_cid. For an example if requests from a certain netwokrk_cidr is
>>>>>> frequent we may want to add more entries with compared to a less frequent
>>>>>> netwokrk_cidr.
>>>>>>
>>>>>> The downside is the DB cache tend to be more big.
>>>>>>
>>>>>> Thanks
>>>>>> *,Sajith Ravindra*
>>>>>> Senior Software Engineer
>>>>>> WSO2 Inc.; http://wso2.com
>>>>>> lean.enterprise.middleware
>>>>>>
>>>>>> mobile: +94 77 2273550
>>>>>> blog: http://sajithr.blogspot.com/
>>>>>> <http://lk.linkedin.com/pub/shani-ranasinghe/34/111/ab>
>>>>>>
>>>>>> On Mon, Mar 7, 2016 at 4:37 AM, Tharindu Dharmarathna <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> Hi Lasantha,
>>>>>>>
>>>>>>> Upto now we are doing the following way in order to get the geo
>>>>>>> location from the stated dump.
>>>>>>>
>>>>>>> 1.  two columns added filled with long value of lower and upper
>>>>>>> value of network ip addresses. Then get the geoname_id with respect to 
>>>>>>> the
>>>>>>> long value for the given ip which between this above long values. Hope 
>>>>>>> you
>>>>>>> will got this idea on our approach. Is there any way to do bit wise
>>>>>>> operation in order to get the network_cidr value ? .
>>>>>>>
>>>>>>
>>>>> Can't we do it by keeping the network IP and the subnet as two columns
>>>>> and the geoname_id as the third. Say for example, if 192.168.0.0/20
>>>>> is the cidr, for IPv4 routing what is usually done is we get the IP as 
>>>>> int,
>>>>> then do a bitwise AND with the subnet mask (e.g. if subnet mask is 20, 
>>>>> that
>>>>> would mean 20 bits with value 1 and remaining 12 bits of value 0, i.e.
>>>>> 11111111 11111111 11110000 00000) and check whether that returns the
>>>>> network IP.
>>>>>
>>>>> You might find more info here [1]. I think there should be libraries
>>>>> that wrap this operation. But if performance is a concern and we need to
>>>>> keep the cache search implementation very lean, we can implement it
>>>>> ourselves.
>>>>>
>>>>> WDYT?
>>>>>
>>>>> [1]
>>>>> http://stackoverflow.com/questions/4209760/validate-an-ip-address-with-mask
>>>>>
>>>>> Thanks,
>>>>> Lasantha
>>>>>
>>>>>
>>>>>>> Thanks
>>>>>>> Tharindu
>>>>>>>
>>>>>>> On Mon, Mar 7, 2016 at 12:05 AM, Lasantha Fernando <
>>>>>>> [email protected]> wrote:
>>>>>>>
>>>>>>>> Hi all,
>>>>>>>>
>>>>>>>> I think what Sachith suggests also makes sense. But am also rooting
>>>>>>>> for the in-memory cache implementation suggested by Sanjeewa with
>>>>>>>> ip-netmask approach.
>>>>>>>>
>>>>>>>> Please find my comments inline.
>>>>>>>>
>>>>>>>> On 5 March 2016 at 23:50, Sachith Withana <[email protected]> wrote:
>>>>>>>>
>>>>>>>>> Hi all,
>>>>>>>>>
>>>>>>>>> From what I understand/was told, this happens once a day ( or
>>>>>>>>> relatively infrequently), and you wanna avoid searching through all 
>>>>>>>>> the geo
>>>>>>>>> data per ip ( since you are grouping the requests by IP).
>>>>>>>>>
>>>>>>>>> IF that's the case, it would be better to use a separate DB table
>>>>>>>>> to cache these data ( IP, geoID ..etc) with the IP being the primary 
>>>>>>>>> key (
>>>>>>>>> which would improve the lookup time), and even though there will be 
>>>>>>>>> cache
>>>>>>>>> misses, it would eventually reduce the (#cacheMisses/ Hits).
>>>>>>>>>
>>>>>>>>> Having a DB cache would be better since you do want to persist
>>>>>>>>> these data to be used over time.
>>>>>>>>>
>>>>>>>>> BTW in a cache miss, if we can figure out a way to limit the
>>>>>>>>> search range on the original table or at least stop the search once a 
>>>>>>>>> match
>>>>>>>>> is found, it would greatly improve the cache miss time as well.
>>>>>>>>>
>>>>>>>>> That's my two cents.
>>>>>>>>>
>>>>>>>>> Cheers,
>>>>>>>>> Sachith
>>>>>>>>>
>>>>>>>>> On Sun, Mar 6, 2016 at 8:24 AM, Janaka Ranabahu <[email protected]>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Sanjeewa,
>>>>>>>>>>
>>>>>>>>>> On Sun, Mar 6, 2016 at 7:25 AM, Sanjeewa Malalgoda <
>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>
>>>>>>>>>>> Implementing cache is better than having another table mapping
>>>>>>>>>>> IMO. What if we query database and keep IP range and network name 
>>>>>>>>>>> in memory.
>>>>>>>>>>> Then we may do quick search on network name and then based on
>>>>>>>>>>> that rest can load some other way.
>>>>>>>>>>> WDYT?
>>>>>>>>>>>
>>>>>>>>>> ​We thought of having an in memory cache but we faced several
>>>>>>>>>> issues along the way. Let me explain the situation as it is per now.​
>>>>>>>>>>
>>>>>>>>>> The Max-Mind DB has the IP addresses with the IP and the netmask.
>>>>>>>>>> Ex: 192.168.0.0/20
>>>>>>>>>>
>>>>>>>>>> The calculation of the IP address range would be like the
>>>>>>>>>> following.
>>>>>>>>>>
>>>>>>>>>> Address:   192.168.0.1           11000000.10101000.0000 0000.00000001
>>>>>>>>>> Netmask:   255.255.240.0 = 20    11111111.11111111.1111 0000.00000000
>>>>>>>>>> Wildcard:  0.0.15.255            00000000.00000000.0000 1111.11111111
>>>>>>>>>> =>Network:   192.168.0.0/20        11000000.10101000.0000 
>>>>>>>>>> 0000.00000000 (Class C)
>>>>>>>>>> Broadcast: 192.168.15.255        11000000.10101000.0000 1111.11111111
>>>>>>>>>> HostMin:   192.168.0.1           11000000.10101000.0000 0000.00000001
>>>>>>>>>> HostMax:   192.168.15.254        11000000.10101000.0000 1111.11111110
>>>>>>>>>> Hosts/Net: 4094                  (Private Internet 
>>>>>>>>>> <http://www.ietf.org/rfc/rfc1918.txt>)
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Therefore what we are currently doing is to calculate the start
>>>>>>>>>> and end IP for all the values in the max-mind DB and alter the 
>>>>>>>>>> tables with
>>>>>>>>>> those values initially(this is a one time thing that will happen). 
>>>>>>>>>> When the
>>>>>>>>>> Spark script executes, we check whether the given IP is between any 
>>>>>>>>>> of the
>>>>>>>>>> start and end ranges in the tables. That is the reason why it is 
>>>>>>>>>> taking a
>>>>>>>>>> long time to fetch results for a given IP.
>>>>>>>>>>
>>>>>>>>>> As a solution for this, we discussed what Tharindu has mentioned.
>>>>>>>>>> 1. Have a in memory caching mechanism.
>>>>>>>>>> 2. Have a DB based caching mechanism.
>>>>>>>>>>
>>>>>>>>>> The only point that we have to highlight is the fact that in both
>>>>>>>>>> the above mechanisms we need to cache the IP address(not the 
>>>>>>>>>> ip-netmask as
>>>>>>>>>> it was in the max-mind db) against the Geo location.
>>>>>>>>>>
>>>>>>>>>> Ex:-
>>>>>>>>>> For 192.168.0.1       - Colombo, Sri Lanka
>>>>>>>>>> For 192.168.15.254 - Colombo, Sri Lanka
>>>>>>>>>>
>>>>>>>>>> So as per the above example I took, if there are requests form
>>>>>>>>>> all the possible 4094 address we will be caching each IP with the Geo
>>>>>>>>>> location(since introducing range queries in a cache is not a good 
>>>>>>>>>> practice).
>>>>>>>>>>
>>>>>>>>>
>>>>>>>> Since we are implementing a custom cache, won't we be doing a
>>>>>>>> bitwise operation for the lookup with netmask and network IP? So 
>>>>>>>> basically,
>>>>>>>> we would keep the network IP and the netmask in cache and simply do a
>>>>>>>> bitwise AND to determine whether it is a match or not, right? Am 
>>>>>>>> thinking
>>>>>>>> such an operation would not incur much of a performance hit and it 
>>>>>>>> won't be
>>>>>>>> as prohibitive as a normal range query in a cache. If that is the 
>>>>>>>> case, I
>>>>>>>> think we can go with the approach suggested by Sanjeewa.
>>>>>>>>
>>>>>>>> WDYT?
>>>>>>>>
>>>>>>>>
>>>>>>>>>> Please find my comments about both the approaches.
>>>>>>>>>>
>>>>>>>>>> 1. Having an in-memory cache would speedup things but based on
>>>>>>>>>> the IPs in the data set, there could be number of entries for IPs in 
>>>>>>>>>> the
>>>>>>>>>> same range. One problem with this approach is that, if there is a 
>>>>>>>>>> server
>>>>>>>>>> restart, the initial script execution would take a lots of time. 
>>>>>>>>>> Also based
>>>>>>>>>> on certain scenarios(high number of different IPs) the cache would 
>>>>>>>>>> not have
>>>>>>>>>> a significant effect on script execution performance.
>>>>>>>>>>
>>>>>>>>>> 2. Having a DB based cache would persist the data even on a
>>>>>>>>>> restart and the data fetching query would be searching for an 
>>>>>>>>>> specific
>>>>>>>>>> value(not a range query as against the max-mind DB). But the 
>>>>>>>>>> downside is
>>>>>>>>>> that for a cache miss there would be minimum 3 DB queries (one for 
>>>>>>>>>> the
>>>>>>>>>> cache table lookup and one for the max-mind db lookup and one for the
>>>>>>>>>> cache persistence).
>>>>>>>>>>
>>>>>>>>>> That is why we have initiated this thread to finalize the caching
>>>>>>>>>> approach we should take.
>>>>>>>>>> ​
>>>>>>>>>> ​Thanks,
>>>>>>>>>> Janaka​
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> sanjeewa.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Lasantha
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>>>> On Fri, Mar 4, 2016 at 3:12 PM, Tharindu Dharmarathna <
>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi All,
>>>>>>>>>>>>
>>>>>>>>>>>> We are going to implement Client IP based Geo-location Graph in
>>>>>>>>>>>> API Manager Analytics. When we go through the ways of doing in [1] 
>>>>>>>>>>>> , we
>>>>>>>>>>>> selected [2] as the most suitable way to do.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> *Overview of max-mind's DB.*
>>>>>>>>>>>>
>>>>>>>>>>>> As the structure of the db (attached in image), They have two
>>>>>>>>>>>> tables which incorporate to get the location.
>>>>>>>>>>>>
>>>>>>>>>>>> Find geoname_id according to network and get Country,City from
>>>>>>>>>>>> locations table.
>>>>>>>>>>>>
>>>>>>>>>>>> *Limitations*
>>>>>>>>>>>>
>>>>>>>>>>>> As their database dump we couldn't directly process the ip from
>>>>>>>>>>>> those tables. We need to check the given ip is in between the 
>>>>>>>>>>>> network min
>>>>>>>>>>>> and max ip. This query get some long time (10 seconds in indexed 
>>>>>>>>>>>> data). If
>>>>>>>>>>>> we directly do this from spark script for each and every ip which 
>>>>>>>>>>>> in
>>>>>>>>>>>> summary table (regardless if ip is same from two row data) will 
>>>>>>>>>>>> query from
>>>>>>>>>>>> the tables. Therefore this will incur the performance impact on 
>>>>>>>>>>>> this graph.
>>>>>>>>>>>>
>>>>>>>>>>>> *Solution*
>>>>>>>>>>>>
>>>>>>>>>>>> 1. Implement LRU cache against ip address vs location.
>>>>>>>>>>>>
>>>>>>>>>>>> This will need to implement on custom UDF in Spark. If ip
>>>>>>>>>>>> querying from spark available in cache it will give the location 
>>>>>>>>>>>> from it ,
>>>>>>>>>>>> IF it is not It will retrieve from DB and put into the cache.
>>>>>>>>>>>>
>>>>>>>>>>>> 2. Persist in a Table
>>>>>>>>>>>>
>>>>>>>>>>>> ip as the primary key and Country and city as other columns and
>>>>>>>>>>>> retrieve data from that table.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Please feel free to give us the most suitable way of doing this
>>>>>>>>>>>> solution?.
>>>>>>>>>>>>
>>>>>>>>>>>> [1] - Implementing Geographical based Analytics in API Manager
>>>>>>>>>>>> mail thread.
>>>>>>>>>>>>
>>>>>>>>>>>> [2] - http://dev.maxmind.com/geoip/geoip2/geolite2/
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> *Thanks*
>>>>>>>>>>>>
>>>>>>>>>>>> *Tharindu Dharmarathna*
>>>>>>>>>>>> Associate Software Engineer
>>>>>>>>>>>> WSO2 Inc.; http://wso2.com
>>>>>>>>>>>> lean.enterprise.middleware
>>>>>>>>>>>>
>>>>>>>>>>>> mobile: *+94779109091 <%2B94779109091>*
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>>
>>>>>>>>>>> *Sanjeewa Malalgoda*
>>>>>>>>>>> WSO2 Inc.
>>>>>>>>>>> Mobile : +94713068779
>>>>>>>>>>>
>>>>>>>>>>> <http://sanjeewamalalgoda.blogspot.com/>blog
>>>>>>>>>>> :http://sanjeewamalalgoda.blogspot.com/
>>>>>>>>>>> <http://sanjeewamalalgoda.blogspot.com/>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> *Janaka Ranabahu*
>>>>>>>>>> Associate Technical Lead, WSO2 Inc.
>>>>>>>>>> http://wso2.com
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> *E-mail: [email protected] <http://wso2.com>**M: **+94 718370861
>>>>>>>>>> <%2B94%20718370861>*
>>>>>>>>>>
>>>>>>>>>> Lean . Enterprise . Middleware
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Sachith Withana
>>>>>>>>> Software Engineer; WSO2 Inc.; http://wso2.com
>>>>>>>>> E-mail: sachith AT wso2.com
>>>>>>>>> M: +94715518127
>>>>>>>>> Linked-In: <http://goog_416592669>
>>>>>>>>> https://lk.linkedin.com/in/sachithwithana
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> Architecture mailing list
>>>>>>>>> [email protected]
>>>>>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> *Lasantha Fernando*
>>>>>>>> Senior Software Engineer - Data Technologies Team
>>>>>>>> WSO2 Inc. http://wso2.com
>>>>>>>>
>>>>>>>> email: [email protected]
>>>>>>>> mobile: (+94) 71 5247551
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Architecture mailing list
>>>>>>>> [email protected]
>>>>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>>> *Tharindu Dharmarathna*Associate Software Engineer
>>>>>>> WSO2 Inc.; http://wso2.com
>>>>>>> lean.enterprise.middleware
>>>>>>>
>>>>>>> mobile: *+94779109091 <%2B94779109091>*
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Architecture mailing list
>>>>>>> [email protected]
>>>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> *Lasantha Fernando*
>>>>> Senior Software Engineer - Data Technologies Team
>>>>> WSO2 Inc. http://wso2.com
>>>>>
>>>>> email: [email protected]
>>>>> mobile: (+94) 71 5247551
>>>>>
>>>>> _______________________________________________
>>>>> Architecture mailing list
>>>>> [email protected]
>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> *Tharindu Dharmarathna*Associate Software Engineer
>>>> WSO2 Inc.; http://wso2.com
>>>> lean.enterprise.middleware
>>>>
>>>> mobile: *+94779109091 <%2B94779109091>*
>>>>
>>>
>>>
>>>
>>> --
>>>
>>> *Tharindu Dharmarathna*Associate Software Engineer
>>> WSO2 Inc.; http://wso2.com
>>> lean.enterprise.middleware
>>>
>>> mobile: *+94779109091 <%2B94779109091>*
>>>
>>
>>
>>
>> --
>> *Janaka Ranabahu*
>> Associate Technical Lead, WSO2 Inc.
>> http://wso2.com
>>
>>
>> *E-mail: [email protected] <http://wso2.com>**M: **+94 718370861
>> <%2B94%20718370861>*
>>
>> Lean . Enterprise . Middleware
>>
>
>
>
> --
> *Kishanthan Thangarajah*
> Associate Technical Lead,
> Platform Technologies Team,
> WSO2, Inc.
> lean.enterprise.middleware
>
> Mobile - +94773426635
> Blog - *http://kishanthan.wordpress.com <http://kishanthan.wordpress.com>*
> Twitter - *http://twitter.com/kishanthan <http://twitter.com/kishanthan>*
>



-- 
Lochana Ranaweera
Intern Software Engineer
WSO2 Inc: http://wso2.com
Blog: https://lochanaranaweera.wordpress.com/
Mobile: +94716487055 <http://tel%2B716487055>
_______________________________________________
Architecture mailing list
[email protected]
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture

Reply via email to