Re: [Architecture] [ML] Anomaly Detection Feature for WSO2 ML

Srinath Perera Thu, 19 Nov 2015 20:32:24 -0800

Can we write an article?

On Thu, Nov 19, 2015 at 10:43 AM, Seshika Fernando <[email protected]> wrote:


> Welldone Ashen. The documentation looks good too. Will review later.
>
> seshi
>
> On Thu, Nov 19, 2015 at 10:35 AM, Ashen Weerathunga <[email protected]>
> wrote:
>
>> Hi all,
>>
>> This feature was implemented on ML and released with WSO2 Machine
>> Learner 1.1.0 - Milestone 1
>> <https://github.com/wso2/product-ml/releases/tag/v1.1.0-m1>. Thanks
>> everyone for your ideas and support. Please find the attachments.
>>
>> [1] PR - carbon-ml
>> [2] PR - product-ml
>>
>> [1] https://github.com/wso2/carbon-ml/pull/138
>> [2] https://github.com/wso2/product-ml/pull/263
>>
>> Thanks and Regards,
>> Ashen
>>
>> On Mon, Sep 28, 2015 at 11:12 PM, Ashen Weerathunga <[email protected]>
>> wrote:
>>
>>> Sure, thanks Mahesan!
>>>
>>> On Mon, Sep 28, 2015 at 9:51 AM, Sinnathamby Mahesan <
>>> [email protected]> wrote:
>>>
>>>
>>>> ---------- Forwarded message ----------
>>>> From: Sinnathamby Mahesan <[email protected]>
>>>> Date: 28 September 2015 at 09:50
>>>> Subject: Re: [Architecture] [ML] Anomaly Detection Feature for WSO2 ML
>>>> To: [email protected]
>>>> Cc: Nirmal Fernando <[email protected]>
>>>>
>>>>
>>>> Dear Ashen
>>>> I know you  have programmed correctly,
>>>>
>>>> but here too
>>>> it is better to show that
>>>>
>>>> if   (ri > di ) for all i=1..k  => Anomalous
>>>>
>>>> where k is the number of clusters
>>>> di is the distance between the point under consideration and the
>>>> cluster centre i
>>>> and
>>>> ri is the percentile radius of cluster i
>>>>
>>>>
>>>> [image: Inline images 2]
>>>>
>>>> :-)
>>>> Best Wishes
>>>>
>>>>
>>>>
>>>>
>>>> On 24 September 2015 at 11:43, Ashen Weerathunga <[email protected]>
>>>> wrote:
>>>>
>>>>> Variables of the above diagram.
>>>>>
>>>>>    - Cc1, Cc2, Cc3 - Cluster centers
>>>>>
>>>>>
>>>>>    - r1 - ith percentile distance of distances of all the points of
>>>>>    cluster 1 to their cluster center (Cc1)
>>>>>    (this is considered as the boundary of cluster 1)
>>>>>
>>>>>
>>>>>    - d1 - distance between particular data point and it's closest
>>>>>    cluster center (Cc1)
>>>>>
>>>>>
>>>>> On Thu, Sep 24, 2015 at 11:25 AM, Ashen Weerathunga <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Thanks for the suggestion!
>>>>>>
>>>>>> This diagram shows how the algorithm detect anomaly behaviors. As in
>>>>>> the diagram when we do the K means clustering there will be set of 
>>>>>> clusters
>>>>>> of normal data and some deviated points which behave as anomalies. since 
>>>>>> we
>>>>>> consider a percentile distance to identify cluster boundaries we can
>>>>>> eliminate those anomaly data from clusters. so when a new data point 
>>>>>> comes
>>>>>> closest cluster center will be calculated and after that comparing
>>>>>> distances we can identify whether it is belong to the cluster or not. If 
>>>>>> it
>>>>>> is not algorithms detect it as a anomaly data.
>>>>>>
>>>>>> [image: Inline image 3]
>>>>>> Hope this will give a more clear view about the algorithm.
>>>>>>
>>>>>> Thanks,
>>>>>> Ashen
>>>>>>
>>>>>> On Wed, Sep 23, 2015 at 6:11 PM, Nirmal Fernando <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> Thanks Ashen! Few diagrams will help readers to understand the
>>>>>>> algorithm better.
>>>>>>>
>>>>>>> On Wed, Sep 23, 2015 at 6:03 PM, Ashen Weerathunga <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi all,
>>>>>>>>
>>>>>>>> I am currently doing the integration of Anomaly detection feature
>>>>>>>> to the WSO2 ML. There are some anomaly/fraud detection features already
>>>>>>>> implemented in CEP/DAS using different approaches. But this will be 
>>>>>>>> done
>>>>>>>> using a machine learning approach which is K means clustering. 
>>>>>>>> Basically I
>>>>>>>> have used K means algorithm provided by Apache Spark MLib which is 
>>>>>>>> already
>>>>>>>> using in WSO2 ML.
>>>>>>>>
>>>>>>>> This feature supports both labeled and unlabeled data. User can
>>>>>>>> build a model using existing data and use that for prediction.
>>>>>>>>
>>>>>>>> The main steps of this feature are as follows,
>>>>>>>>
>>>>>>>>    - After doing the preprocessing steps user will have to select
>>>>>>>>    the algorithm. There will be two algorithms under Anomaly Detection 
>>>>>>>> category
>>>>>>>>       - K Means with Unlabeled data
>>>>>>>>       - K Means with Labeled data - If user have labeled data user
>>>>>>>>       can go for this option
>>>>>>>>       - If user select K Means with labeled data option user
>>>>>>>>    should input Normal label(s) values and train data fraction as well.
>>>>>>>>    - In the next step user will have to input three parameters
>>>>>>>>       - Maximum number of iterations
>>>>>>>>       - Number of normal clusters
>>>>>>>>       - Percentile value
>>>>>>>>       - Then the model will be build using those parameters
>>>>>>>>    - A model summery will be provided for labeled data option
>>>>>>>>    which shows the model accuracy measures,confusion matrix, etc.
>>>>>>>>    - In the prediction part user will have two options as to input
>>>>>>>>    new data as a csv or tsv file or manually enter new data values. As 
>>>>>>>> the
>>>>>>>>    prediction it will show whether the new data point is an anomaly or 
>>>>>>>> not.
>>>>>>>>
>>>>>>>> The methodology used is as follows,
>>>>>>>>
>>>>>>>>    - First the dataset will be clustered using K means algorithm
>>>>>>>>    according to hyper parameters that user provided.
>>>>>>>>    - Since in the real world scenario of anomaly detection the
>>>>>>>>    positive(anomaly) instances are vary rare, we assume that those 
>>>>>>>> anomalies
>>>>>>>>    will be in outside from the clusters.
>>>>>>>>    - So we can detect them by calculating the cluster boundaries.
>>>>>>>>    This is how we identify the cluster boundaries,
>>>>>>>>       - First calculate all the distances between data points and
>>>>>>>>       their respective cluster centers.
>>>>>>>>       - Then select the percentile value from distances of each
>>>>>>>>       clusters as their cluster boundaries.
>>>>>>>>    - When a new data point comes the closest cluster center will
>>>>>>>>    be calculated by K means predict function.
>>>>>>>>    - Then the distance between new data point and It's cluster
>>>>>>>>    center will be calculated. If it is less than the percentile 
>>>>>>>> distance value
>>>>>>>>    it is considered as a normal data. If it is grater than the 
>>>>>>>> percentile
>>>>>>>>    distance value it is considered as a anomaly since it is in outside 
>>>>>>>> the
>>>>>>>>    cluster.
>>>>>>>>
>>>>>>>> Most of the work have completed by now. Please let me know if there
>>>>>>>> are any issues or improvements to be done.
>>>>>>>> https://github.com/ashensw/carbon-ml/tree/fraud_detection
>>>>>>>>
>>>>>>>> Thanks and Regards,
>>>>>>>> Ashen
>>>>>>>>
>>>>>>>> --
>>>>>>>> *Ashen Weerathunga*
>>>>>>>> Software Engineer - Intern
>>>>>>>> WSO2 Inc.: http://wso2.com
>>>>>>>> lean.enterprise.middleware
>>>>>>>>
>>>>>>>> Email: [email protected]
>>>>>>>> Mobile: +94 716042995 <94716042995>
>>>>>>>> LinkedIn:
>>>>>>>> *http://lk.linkedin.com/in/ashenweerathunga
>>>>>>>> <http://lk.linkedin.com/in/ashenweerathunga>*
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>>> Thanks & regards,
>>>>>>> Nirmal
>>>>>>>
>>>>>>> Team Lead - WSO2 Machine Learner
>>>>>>> Associate Technical Lead - Data Technologies Team, WSO2 Inc.
>>>>>>> Mobile: +94715779733
>>>>>>> Blog: http://nirmalfdo.blogspot.com/
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> *Ashen Weerathunga*
>>>>>> Software Engineer - Intern
>>>>>> WSO2 Inc.: http://wso2.com
>>>>>> lean.enterprise.middleware
>>>>>>
>>>>>> Email: [email protected]
>>>>>> Mobile: +94 716042995 <94716042995>
>>>>>> LinkedIn:
>>>>>> *http://lk.linkedin.com/in/ashenweerathunga
>>>>>> <http://lk.linkedin.com/in/ashenweerathunga>*
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> *Ashen Weerathunga*
>>>>> Software Engineer - Intern
>>>>> WSO2 Inc.: http://wso2.com
>>>>> lean.enterprise.middleware
>>>>>
>>>>> Email: [email protected]
>>>>> Mobile: +94 716042995 <94716042995>
>>>>> LinkedIn:
>>>>> *http://lk.linkedin.com/in/ashenweerathunga
>>>>> <http://lk.linkedin.com/in/ashenweerathunga>*
>>>>>
>>>>> _______________________________________________
>>>>> Architecture mailing list
>>>>> [email protected]
>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>>> Sinnathamby Mahesan
>>>>
>>>>
>>>>
>>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>>>
>>>>
>>>>
>>>> --
>>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>>> Sinnathamby Mahesan
>>>>
>>>>
>>>>
>>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>>>
>>>
>>>
>>>
>>> --
>>> *Ashen Weerathunga*
>>> Software Engineer - Intern
>>> WSO2 Inc.: http://wso2.com
>>> lean.enterprise.middleware
>>>
>>> Email: [email protected]
>>> Mobile: +94 716042995 <94716042995>
>>> LinkedIn:
>>> *http://lk.linkedin.com/in/ashenweerathunga
>>> <http://lk.linkedin.com/in/ashenweerathunga>*
>>>
>>
>>
>>
>> --
>> *Ashen Weerathunga*
>> Software Engineer - Intern
>> WSO2 Inc.: http://wso2.com
>> lean.enterprise.middleware
>>
>> Email: [email protected]
>> Mobile: +94 716042995 <94716042995>
>> LinkedIn:
>> *http://lk.linkedin.com/in/ashenweerathunga
>> <http://lk.linkedin.com/in/ashenweerathunga>*
>>
>
>


-- 
============================
Blog: http://srinathsview.blogspot.com twitter:@srinath_perera
Site: http://people.apache.org/~hemapani/
Photos: http://www.flickr.com/photos/hemapani/
Phone: 0772360902

_______________________________________________
Architecture mailing list
[email protected]
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture

Re: [Architecture] [ML] Anomaly Detection Feature for WSO2 ML

Reply via email to