Re: [Architecture] [ML] Anomaly Detection Feature for WSO2 ML

Nirmal Fernando Fri, 20 Nov 2015 03:27:50 -0800

Have to mention about the excellent work done and the support extended by
Ashen during the whole implementation process. He was able to grasp things
soon and get regular feedback and improve very quickly. Kudos to Ashen and
keep up the good work! Hope you have learnt a lot during this process.


On Thu, Nov 19, 2015 at 10:43 AM, Seshika Fernando <[email protected]> wrote:

> Welldone Ashen. The documentation looks good too. Will review later.
>
> seshi
>
> On Thu, Nov 19, 2015 at 10:35 AM, Ashen Weerathunga <[email protected]>
> wrote:
>
>> Hi all,
>>
>> This feature was implemented on ML and released with WSO2 Machine
>> Learner 1.1.0 - Milestone 1
>> <https://github.com/wso2/product-ml/releases/tag/v1.1.0-m1>. Thanks
>> everyone for your ideas and support. Please find the attachments.
>>
>> [1] PR - carbon-ml
>> [2] PR - product-ml
>>
>> [1] https://github.com/wso2/carbon-ml/pull/138
>> [2] https://github.com/wso2/product-ml/pull/263
>>
>> Thanks and Regards,
>> Ashen
>>
>> On Mon, Sep 28, 2015 at 11:12 PM, Ashen Weerathunga <[email protected]>
>> wrote:
>>
>>> Sure, thanks Mahesan!
>>>
>>> On Mon, Sep 28, 2015 at 9:51 AM, Sinnathamby Mahesan <
>>> [email protected]> wrote:
>>>
>>>
>>>> ---------- Forwarded message ----------
>>>> From: Sinnathamby Mahesan <[email protected]>
>>>> Date: 28 September 2015 at 09:50
>>>> Subject: Re: [Architecture] [ML] Anomaly Detection Feature for WSO2 ML
>>>> To: [email protected]
>>>> Cc: Nirmal Fernando <[email protected]>
>>>>
>>>>
>>>> Dear Ashen
>>>> I know you  have programmed correctly,
>>>>
>>>> but here too
>>>> it is better to show that
>>>>
>>>> if   (ri > di ) for all i=1..k  => Anomalous
>>>>
>>>> where k is the number of clusters
>>>> di is the distance between the point under consideration and the
>>>> cluster centre i
>>>> and
>>>> ri is the percentile radius of cluster i
>>>>
>>>>
>>>> [image: Inline images 2]
>>>>
>>>> :-)
>>>> Best Wishes
>>>>
>>>>
>>>>
>>>>
>>>> On 24 September 2015 at 11:43, Ashen Weerathunga <[email protected]>
>>>> wrote:
>>>>
>>>>> Variables of the above diagram.
>>>>>
>>>>>    - Cc1, Cc2, Cc3 - Cluster centers
>>>>>
>>>>>
>>>>>    - r1 - ith percentile distance of distances of all the points of
>>>>>    cluster 1 to their cluster center (Cc1)
>>>>>    (this is considered as the boundary of cluster 1)
>>>>>
>>>>>
>>>>>    - d1 - distance between particular data point and it's closest
>>>>>    cluster center (Cc1)
>>>>>
>>>>>
>>>>> On Thu, Sep 24, 2015 at 11:25 AM, Ashen Weerathunga <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Thanks for the suggestion!
>>>>>>
>>>>>> This diagram shows how the algorithm detect anomaly behaviors. As in
>>>>>> the diagram when we do the K means clustering there will be set of 
>>>>>> clusters
>>>>>> of normal data and some deviated points which behave as anomalies. since 
>>>>>> we
>>>>>> consider a percentile distance to identify cluster boundaries we can
>>>>>> eliminate those anomaly data from clusters. so when a new data point 
>>>>>> comes
>>>>>> closest cluster center will be calculated and after that comparing
>>>>>> distances we can identify whether it is belong to the cluster or not. If 
>>>>>> it
>>>>>> is not algorithms detect it as a anomaly data.
>>>>>>
>>>>>> [image: Inline image 3]
>>>>>> Hope this will give a more clear view about the algorithm.
>>>>>>
>>>>>> Thanks,
>>>>>> Ashen
>>>>>>
>>>>>> On Wed, Sep 23, 2015 at 6:11 PM, Nirmal Fernando <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> Thanks Ashen! Few diagrams will help readers to understand the
>>>>>>> algorithm better.
>>>>>>>
>>>>>>> On Wed, Sep 23, 2015 at 6:03 PM, Ashen Weerathunga <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi all,
>>>>>>>>
>>>>>>>> I am currently doing the integration of Anomaly detection feature
>>>>>>>> to the WSO2 ML. There are some anomaly/fraud detection features already
>>>>>>>> implemented in CEP/DAS using different approaches. But this will be 
>>>>>>>> done
>>>>>>>> using a machine learning approach which is K means clustering. 
>>>>>>>> Basically I
>>>>>>>> have used K means algorithm provided by Apache Spark MLib which is 
>>>>>>>> already
>>>>>>>> using in WSO2 ML.
>>>>>>>>
>>>>>>>> This feature supports both labeled and unlabeled data. User can
>>>>>>>> build a model using existing data and use that for prediction.
>>>>>>>>
>>>>>>>> The main steps of this feature are as follows,
>>>>>>>>
>>>>>>>>    - After doing the preprocessing steps user will have to select
>>>>>>>>    the algorithm. There will be two algorithms under Anomaly Detection 
>>>>>>>> category
>>>>>>>>       - K Means with Unlabeled data
>>>>>>>>       - K Means with Labeled data - If user have labeled data user
>>>>>>>>       can go for this option
>>>>>>>>       - If user select K Means with labeled data option user
>>>>>>>>    should input Normal label(s) values and train data fraction as well.
>>>>>>>>    - In the next step user will have to input three parameters
>>>>>>>>       - Maximum number of iterations
>>>>>>>>       - Number of normal clusters
>>>>>>>>       - Percentile value
>>>>>>>>       - Then the model will be build using those parameters
>>>>>>>>    - A model summery will be provided for labeled data option
>>>>>>>>    which shows the model accuracy measures,confusion matrix, etc.
>>>>>>>>    - In the prediction part user will have two options as to input
>>>>>>>>    new data as a csv or tsv file or manually enter new data values. As 
>>>>>>>> the
>>>>>>>>    prediction it will show whether the new data point is an anomaly or 
>>>>>>>> not.
>>>>>>>>
>>>>>>>> The methodology used is as follows,
>>>>>>>>
>>>>>>>>    - First the dataset will be clustered using K means algorithm
>>>>>>>>    according to hyper parameters that user provided.
>>>>>>>>    - Since in the real world scenario of anomaly detection the
>>>>>>>>    positive(anomaly) instances are vary rare, we assume that those 
>>>>>>>> anomalies
>>>>>>>>    will be in outside from the clusters.
>>>>>>>>    - So we can detect them by calculating the cluster boundaries.
>>>>>>>>    This is how we identify the cluster boundaries,
>>>>>>>>       - First calculate all the distances between data points and
>>>>>>>>       their respective cluster centers.
>>>>>>>>       - Then select the percentile value from distances of each
>>>>>>>>       clusters as their cluster boundaries.
>>>>>>>>    - When a new data point comes the closest cluster center will
>>>>>>>>    be calculated by K means predict function.
>>>>>>>>    - Then the distance between new data point and It's cluster
>>>>>>>>    center will be calculated. If it is less than the percentile 
>>>>>>>> distance value
>>>>>>>>    it is considered as a normal data. If it is grater than the 
>>>>>>>> percentile
>>>>>>>>    distance value it is considered as a anomaly since it is in outside 
>>>>>>>> the
>>>>>>>>    cluster.
>>>>>>>>
>>>>>>>> Most of the work have completed by now. Please let me know if there
>>>>>>>> are any issues or improvements to be done.
>>>>>>>> https://github.com/ashensw/carbon-ml/tree/fraud_detection
>>>>>>>>
>>>>>>>> Thanks and Regards,
>>>>>>>> Ashen
>>>>>>>>
>>>>>>>> --
>>>>>>>> *Ashen Weerathunga*
>>>>>>>> Software Engineer - Intern
>>>>>>>> WSO2 Inc.: http://wso2.com
>>>>>>>> lean.enterprise.middleware
>>>>>>>>
>>>>>>>> Email: [email protected]
>>>>>>>> Mobile: +94 716042995 <94716042995>
>>>>>>>> LinkedIn:
>>>>>>>> *http://lk.linkedin.com/in/ashenweerathunga
>>>>>>>> <http://lk.linkedin.com/in/ashenweerathunga>*
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>>> Thanks & regards,
>>>>>>> Nirmal
>>>>>>>
>>>>>>> Team Lead - WSO2 Machine Learner
>>>>>>> Associate Technical Lead - Data Technologies Team, WSO2 Inc.
>>>>>>> Mobile: +94715779733
>>>>>>> Blog: http://nirmalfdo.blogspot.com/
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> *Ashen Weerathunga*
>>>>>> Software Engineer - Intern
>>>>>> WSO2 Inc.: http://wso2.com
>>>>>> lean.enterprise.middleware
>>>>>>
>>>>>> Email: [email protected]
>>>>>> Mobile: +94 716042995 <94716042995>
>>>>>> LinkedIn:
>>>>>> *http://lk.linkedin.com/in/ashenweerathunga
>>>>>> <http://lk.linkedin.com/in/ashenweerathunga>*
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> *Ashen Weerathunga*
>>>>> Software Engineer - Intern
>>>>> WSO2 Inc.: http://wso2.com
>>>>> lean.enterprise.middleware
>>>>>
>>>>> Email: [email protected]
>>>>> Mobile: +94 716042995 <94716042995>
>>>>> LinkedIn:
>>>>> *http://lk.linkedin.com/in/ashenweerathunga
>>>>> <http://lk.linkedin.com/in/ashenweerathunga>*
>>>>>
>>>>> _______________________________________________
>>>>> Architecture mailing list
>>>>> [email protected]
>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>>> Sinnathamby Mahesan
>>>>
>>>>
>>>>
>>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>>>
>>>>
>>>>
>>>> --
>>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>>> Sinnathamby Mahesan
>>>>
>>>>
>>>>
>>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>>>
>>>
>>>
>>>
>>> --
>>> *Ashen Weerathunga*
>>> Software Engineer - Intern
>>> WSO2 Inc.: http://wso2.com
>>> lean.enterprise.middleware
>>>
>>> Email: [email protected]
>>> Mobile: +94 716042995 <94716042995>
>>> LinkedIn:
>>> *http://lk.linkedin.com/in/ashenweerathunga
>>> <http://lk.linkedin.com/in/ashenweerathunga>*
>>>
>>
>>
>>
>> --
>> *Ashen Weerathunga*
>> Software Engineer - Intern
>> WSO2 Inc.: http://wso2.com
>> lean.enterprise.middleware
>>
>> Email: [email protected]
>> Mobile: +94 716042995 <94716042995>
>> LinkedIn:
>> *http://lk.linkedin.com/in/ashenweerathunga
>> <http://lk.linkedin.com/in/ashenweerathunga>*
>>
>
>


-- 

Thanks & regards,
Nirmal

Team Lead - WSO2 Machine Learner
Associate Technical Lead - Data Technologies Team, WSO2 Inc.
Mobile: +94715779733
Blog: http://nirmalfdo.blogspot.com/

_______________________________________________
Architecture mailing list
[email protected]
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture

Re: [Architecture] [ML] Anomaly Detection Feature for WSO2 ML

Reply via email to