Welldone Ashen. The documentation looks good too. Will review later.

seshi

On Thu, Nov 19, 2015 at 10:35 AM, Ashen Weerathunga <[email protected]> wrote:

> Hi all,
>
> This feature was implemented on ML and released with WSO2 Machine Learner
> 1.1.0 - Milestone 1
> <https://github.com/wso2/product-ml/releases/tag/v1.1.0-m1>. Thanks
> everyone for your ideas and support. Please find the attachments.
>
> [1] PR - carbon-ml
> [2] PR - product-ml
>
> [1] https://github.com/wso2/carbon-ml/pull/138
> [2] https://github.com/wso2/product-ml/pull/263
>
> Thanks and Regards,
> Ashen
>
> On Mon, Sep 28, 2015 at 11:12 PM, Ashen Weerathunga <[email protected]>
> wrote:
>
>> Sure, thanks Mahesan!
>>
>> On Mon, Sep 28, 2015 at 9:51 AM, Sinnathamby Mahesan <
>> [email protected]> wrote:
>>
>>
>>> ---------- Forwarded message ----------
>>> From: Sinnathamby Mahesan <[email protected]>
>>> Date: 28 September 2015 at 09:50
>>> Subject: Re: [Architecture] [ML] Anomaly Detection Feature for WSO2 ML
>>> To: [email protected]
>>> Cc: Nirmal Fernando <[email protected]>
>>>
>>>
>>> Dear Ashen
>>> I know you  have programmed correctly,
>>>
>>> but here too
>>> it is better to show that
>>>
>>> if   (ri > di ) for all i=1..k  => Anomalous
>>>
>>> where k is the number of clusters
>>> di is the distance between the point under consideration and the
>>> cluster centre i
>>> and
>>> ri is the percentile radius of cluster i
>>>
>>>
>>> [image: Inline images 2]
>>>
>>> :-)
>>> Best Wishes
>>>
>>>
>>>
>>>
>>> On 24 September 2015 at 11:43, Ashen Weerathunga <[email protected]> wrote:
>>>
>>>> Variables of the above diagram.
>>>>
>>>>    - Cc1, Cc2, Cc3 - Cluster centers
>>>>
>>>>
>>>>    - r1 - ith percentile distance of distances of all the points of
>>>>    cluster 1 to their cluster center (Cc1)
>>>>    (this is considered as the boundary of cluster 1)
>>>>
>>>>
>>>>    - d1 - distance between particular data point and it's closest
>>>>    cluster center (Cc1)
>>>>
>>>>
>>>> On Thu, Sep 24, 2015 at 11:25 AM, Ashen Weerathunga <[email protected]>
>>>> wrote:
>>>>
>>>>> Thanks for the suggestion!
>>>>>
>>>>> This diagram shows how the algorithm detect anomaly behaviors. As in
>>>>> the diagram when we do the K means clustering there will be set of 
>>>>> clusters
>>>>> of normal data and some deviated points which behave as anomalies. since 
>>>>> we
>>>>> consider a percentile distance to identify cluster boundaries we can
>>>>> eliminate those anomaly data from clusters. so when a new data point comes
>>>>> closest cluster center will be calculated and after that comparing
>>>>> distances we can identify whether it is belong to the cluster or not. If 
>>>>> it
>>>>> is not algorithms detect it as a anomaly data.
>>>>>
>>>>> [image: Inline image 3]
>>>>> Hope this will give a more clear view about the algorithm.
>>>>>
>>>>> Thanks,
>>>>> Ashen
>>>>>
>>>>> On Wed, Sep 23, 2015 at 6:11 PM, Nirmal Fernando <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Thanks Ashen! Few diagrams will help readers to understand the
>>>>>> algorithm better.
>>>>>>
>>>>>> On Wed, Sep 23, 2015 at 6:03 PM, Ashen Weerathunga <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi all,
>>>>>>>
>>>>>>> I am currently doing the integration of Anomaly detection feature to
>>>>>>> the WSO2 ML. There are some anomaly/fraud detection features already
>>>>>>> implemented in CEP/DAS using different approaches. But this will be done
>>>>>>> using a machine learning approach which is K means clustering. 
>>>>>>> Basically I
>>>>>>> have used K means algorithm provided by Apache Spark MLib which is 
>>>>>>> already
>>>>>>> using in WSO2 ML.
>>>>>>>
>>>>>>> This feature supports both labeled and unlabeled data. User can
>>>>>>> build a model using existing data and use that for prediction.
>>>>>>>
>>>>>>> The main steps of this feature are as follows,
>>>>>>>
>>>>>>>    - After doing the preprocessing steps user will have to select
>>>>>>>    the algorithm. There will be two algorithms under Anomaly Detection 
>>>>>>> category
>>>>>>>       - K Means with Unlabeled data
>>>>>>>       - K Means with Labeled data - If user have labeled data user
>>>>>>>       can go for this option
>>>>>>>       - If user select K Means with labeled data option user should
>>>>>>>    input Normal label(s) values and train data fraction as well.
>>>>>>>    - In the next step user will have to input three parameters
>>>>>>>       - Maximum number of iterations
>>>>>>>       - Number of normal clusters
>>>>>>>       - Percentile value
>>>>>>>       - Then the model will be build using those parameters
>>>>>>>    - A model summery will be provided for labeled data option which
>>>>>>>    shows the model accuracy measures,confusion matrix, etc.
>>>>>>>    - In the prediction part user will have two options as to input
>>>>>>>    new data as a csv or tsv file or manually enter new data values. As 
>>>>>>> the
>>>>>>>    prediction it will show whether the new data point is an anomaly or 
>>>>>>> not.
>>>>>>>
>>>>>>> The methodology used is as follows,
>>>>>>>
>>>>>>>    - First the dataset will be clustered using K means algorithm
>>>>>>>    according to hyper parameters that user provided.
>>>>>>>    - Since in the real world scenario of anomaly detection the
>>>>>>>    positive(anomaly) instances are vary rare, we assume that those 
>>>>>>> anomalies
>>>>>>>    will be in outside from the clusters.
>>>>>>>    - So we can detect them by calculating the cluster boundaries.
>>>>>>>    This is how we identify the cluster boundaries,
>>>>>>>       - First calculate all the distances between data points and
>>>>>>>       their respective cluster centers.
>>>>>>>       - Then select the percentile value from distances of each
>>>>>>>       clusters as their cluster boundaries.
>>>>>>>    - When a new data point comes the closest cluster center will be
>>>>>>>    calculated by K means predict function.
>>>>>>>    - Then the distance between new data point and It's cluster
>>>>>>>    center will be calculated. If it is less than the percentile 
>>>>>>> distance value
>>>>>>>    it is considered as a normal data. If it is grater than the 
>>>>>>> percentile
>>>>>>>    distance value it is considered as a anomaly since it is in outside 
>>>>>>> the
>>>>>>>    cluster.
>>>>>>>
>>>>>>> Most of the work have completed by now. Please let me know if there
>>>>>>> are any issues or improvements to be done.
>>>>>>> https://github.com/ashensw/carbon-ml/tree/fraud_detection
>>>>>>>
>>>>>>> Thanks and Regards,
>>>>>>> Ashen
>>>>>>>
>>>>>>> --
>>>>>>> *Ashen Weerathunga*
>>>>>>> Software Engineer - Intern
>>>>>>> WSO2 Inc.: http://wso2.com
>>>>>>> lean.enterprise.middleware
>>>>>>>
>>>>>>> Email: [email protected]
>>>>>>> Mobile: +94 716042995 <94716042995>
>>>>>>> LinkedIn:
>>>>>>> *http://lk.linkedin.com/in/ashenweerathunga
>>>>>>> <http://lk.linkedin.com/in/ashenweerathunga>*
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>>
>>>>>> Thanks & regards,
>>>>>> Nirmal
>>>>>>
>>>>>> Team Lead - WSO2 Machine Learner
>>>>>> Associate Technical Lead - Data Technologies Team, WSO2 Inc.
>>>>>> Mobile: +94715779733
>>>>>> Blog: http://nirmalfdo.blogspot.com/
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> *Ashen Weerathunga*
>>>>> Software Engineer - Intern
>>>>> WSO2 Inc.: http://wso2.com
>>>>> lean.enterprise.middleware
>>>>>
>>>>> Email: [email protected]
>>>>> Mobile: +94 716042995 <94716042995>
>>>>> LinkedIn:
>>>>> *http://lk.linkedin.com/in/ashenweerathunga
>>>>> <http://lk.linkedin.com/in/ashenweerathunga>*
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> *Ashen Weerathunga*
>>>> Software Engineer - Intern
>>>> WSO2 Inc.: http://wso2.com
>>>> lean.enterprise.middleware
>>>>
>>>> Email: [email protected]
>>>> Mobile: +94 716042995 <94716042995>
>>>> LinkedIn:
>>>> *http://lk.linkedin.com/in/ashenweerathunga
>>>> <http://lk.linkedin.com/in/ashenweerathunga>*
>>>>
>>>> _______________________________________________
>>>> Architecture mailing list
>>>> [email protected]
>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>>>
>>>>
>>>
>>>
>>> --
>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>> Sinnathamby Mahesan
>>>
>>>
>>>
>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>>
>>>
>>>
>>> --
>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>> Sinnathamby Mahesan
>>>
>>>
>>>
>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>>
>>
>>
>>
>> --
>> *Ashen Weerathunga*
>> Software Engineer - Intern
>> WSO2 Inc.: http://wso2.com
>> lean.enterprise.middleware
>>
>> Email: [email protected]
>> Mobile: +94 716042995 <94716042995>
>> LinkedIn:
>> *http://lk.linkedin.com/in/ashenweerathunga
>> <http://lk.linkedin.com/in/ashenweerathunga>*
>>
>
>
>
> --
> *Ashen Weerathunga*
> Software Engineer - Intern
> WSO2 Inc.: http://wso2.com
> lean.enterprise.middleware
>
> Email: [email protected]
> Mobile: +94 716042995 <94716042995>
> LinkedIn:
> *http://lk.linkedin.com/in/ashenweerathunga
> <http://lk.linkedin.com/in/ashenweerathunga>*
>
_______________________________________________
Architecture mailing list
[email protected]
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture

Reply via email to