Welldone Ashen. The documentation looks good too. Will review later. seshi
On Thu, Nov 19, 2015 at 10:35 AM, Ashen Weerathunga <[email protected]> wrote: > Hi all, > > This feature was implemented on ML and released with WSO2 Machine Learner > 1.1.0 - Milestone 1 > <https://github.com/wso2/product-ml/releases/tag/v1.1.0-m1>. Thanks > everyone for your ideas and support. Please find the attachments. > > [1] PR - carbon-ml > [2] PR - product-ml > > [1] https://github.com/wso2/carbon-ml/pull/138 > [2] https://github.com/wso2/product-ml/pull/263 > > Thanks and Regards, > Ashen > > On Mon, Sep 28, 2015 at 11:12 PM, Ashen Weerathunga <[email protected]> > wrote: > >> Sure, thanks Mahesan! >> >> On Mon, Sep 28, 2015 at 9:51 AM, Sinnathamby Mahesan < >> [email protected]> wrote: >> >> >>> ---------- Forwarded message ---------- >>> From: Sinnathamby Mahesan <[email protected]> >>> Date: 28 September 2015 at 09:50 >>> Subject: Re: [Architecture] [ML] Anomaly Detection Feature for WSO2 ML >>> To: [email protected] >>> Cc: Nirmal Fernando <[email protected]> >>> >>> >>> Dear Ashen >>> I know you have programmed correctly, >>> >>> but here too >>> it is better to show that >>> >>> if (ri > di ) for all i=1..k => Anomalous >>> >>> where k is the number of clusters >>> di is the distance between the point under consideration and the >>> cluster centre i >>> and >>> ri is the percentile radius of cluster i >>> >>> >>> [image: Inline images 2] >>> >>> :-) >>> Best Wishes >>> >>> >>> >>> >>> On 24 September 2015 at 11:43, Ashen Weerathunga <[email protected]> wrote: >>> >>>> Variables of the above diagram. >>>> >>>> - Cc1, Cc2, Cc3 - Cluster centers >>>> >>>> >>>> - r1 - ith percentile distance of distances of all the points of >>>> cluster 1 to their cluster center (Cc1) >>>> (this is considered as the boundary of cluster 1) >>>> >>>> >>>> - d1 - distance between particular data point and it's closest >>>> cluster center (Cc1) >>>> >>>> >>>> On Thu, Sep 24, 2015 at 11:25 AM, Ashen Weerathunga <[email protected]> >>>> wrote: >>>> >>>>> Thanks for the suggestion! >>>>> >>>>> This diagram shows how the algorithm detect anomaly behaviors. As in >>>>> the diagram when we do the K means clustering there will be set of >>>>> clusters >>>>> of normal data and some deviated points which behave as anomalies. since >>>>> we >>>>> consider a percentile distance to identify cluster boundaries we can >>>>> eliminate those anomaly data from clusters. so when a new data point comes >>>>> closest cluster center will be calculated and after that comparing >>>>> distances we can identify whether it is belong to the cluster or not. If >>>>> it >>>>> is not algorithms detect it as a anomaly data. >>>>> >>>>> [image: Inline image 3] >>>>> Hope this will give a more clear view about the algorithm. >>>>> >>>>> Thanks, >>>>> Ashen >>>>> >>>>> On Wed, Sep 23, 2015 at 6:11 PM, Nirmal Fernando <[email protected]> >>>>> wrote: >>>>> >>>>>> Thanks Ashen! Few diagrams will help readers to understand the >>>>>> algorithm better. >>>>>> >>>>>> On Wed, Sep 23, 2015 at 6:03 PM, Ashen Weerathunga <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> Hi all, >>>>>>> >>>>>>> I am currently doing the integration of Anomaly detection feature to >>>>>>> the WSO2 ML. There are some anomaly/fraud detection features already >>>>>>> implemented in CEP/DAS using different approaches. But this will be done >>>>>>> using a machine learning approach which is K means clustering. >>>>>>> Basically I >>>>>>> have used K means algorithm provided by Apache Spark MLib which is >>>>>>> already >>>>>>> using in WSO2 ML. >>>>>>> >>>>>>> This feature supports both labeled and unlabeled data. User can >>>>>>> build a model using existing data and use that for prediction. >>>>>>> >>>>>>> The main steps of this feature are as follows, >>>>>>> >>>>>>> - After doing the preprocessing steps user will have to select >>>>>>> the algorithm. There will be two algorithms under Anomaly Detection >>>>>>> category >>>>>>> - K Means with Unlabeled data >>>>>>> - K Means with Labeled data - If user have labeled data user >>>>>>> can go for this option >>>>>>> - If user select K Means with labeled data option user should >>>>>>> input Normal label(s) values and train data fraction as well. >>>>>>> - In the next step user will have to input three parameters >>>>>>> - Maximum number of iterations >>>>>>> - Number of normal clusters >>>>>>> - Percentile value >>>>>>> - Then the model will be build using those parameters >>>>>>> - A model summery will be provided for labeled data option which >>>>>>> shows the model accuracy measures,confusion matrix, etc. >>>>>>> - In the prediction part user will have two options as to input >>>>>>> new data as a csv or tsv file or manually enter new data values. As >>>>>>> the >>>>>>> prediction it will show whether the new data point is an anomaly or >>>>>>> not. >>>>>>> >>>>>>> The methodology used is as follows, >>>>>>> >>>>>>> - First the dataset will be clustered using K means algorithm >>>>>>> according to hyper parameters that user provided. >>>>>>> - Since in the real world scenario of anomaly detection the >>>>>>> positive(anomaly) instances are vary rare, we assume that those >>>>>>> anomalies >>>>>>> will be in outside from the clusters. >>>>>>> - So we can detect them by calculating the cluster boundaries. >>>>>>> This is how we identify the cluster boundaries, >>>>>>> - First calculate all the distances between data points and >>>>>>> their respective cluster centers. >>>>>>> - Then select the percentile value from distances of each >>>>>>> clusters as their cluster boundaries. >>>>>>> - When a new data point comes the closest cluster center will be >>>>>>> calculated by K means predict function. >>>>>>> - Then the distance between new data point and It's cluster >>>>>>> center will be calculated. If it is less than the percentile >>>>>>> distance value >>>>>>> it is considered as a normal data. If it is grater than the >>>>>>> percentile >>>>>>> distance value it is considered as a anomaly since it is in outside >>>>>>> the >>>>>>> cluster. >>>>>>> >>>>>>> Most of the work have completed by now. Please let me know if there >>>>>>> are any issues or improvements to be done. >>>>>>> https://github.com/ashensw/carbon-ml/tree/fraud_detection >>>>>>> >>>>>>> Thanks and Regards, >>>>>>> Ashen >>>>>>> >>>>>>> -- >>>>>>> *Ashen Weerathunga* >>>>>>> Software Engineer - Intern >>>>>>> WSO2 Inc.: http://wso2.com >>>>>>> lean.enterprise.middleware >>>>>>> >>>>>>> Email: [email protected] >>>>>>> Mobile: +94 716042995 <94716042995> >>>>>>> LinkedIn: >>>>>>> *http://lk.linkedin.com/in/ashenweerathunga >>>>>>> <http://lk.linkedin.com/in/ashenweerathunga>* >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> >>>>>> Thanks & regards, >>>>>> Nirmal >>>>>> >>>>>> Team Lead - WSO2 Machine Learner >>>>>> Associate Technical Lead - Data Technologies Team, WSO2 Inc. >>>>>> Mobile: +94715779733 >>>>>> Blog: http://nirmalfdo.blogspot.com/ >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> *Ashen Weerathunga* >>>>> Software Engineer - Intern >>>>> WSO2 Inc.: http://wso2.com >>>>> lean.enterprise.middleware >>>>> >>>>> Email: [email protected] >>>>> Mobile: +94 716042995 <94716042995> >>>>> LinkedIn: >>>>> *http://lk.linkedin.com/in/ashenweerathunga >>>>> <http://lk.linkedin.com/in/ashenweerathunga>* >>>>> >>>> >>>> >>>> >>>> -- >>>> *Ashen Weerathunga* >>>> Software Engineer - Intern >>>> WSO2 Inc.: http://wso2.com >>>> lean.enterprise.middleware >>>> >>>> Email: [email protected] >>>> Mobile: +94 716042995 <94716042995> >>>> LinkedIn: >>>> *http://lk.linkedin.com/in/ashenweerathunga >>>> <http://lk.linkedin.com/in/ashenweerathunga>* >>>> >>>> _______________________________________________ >>>> Architecture mailing list >>>> [email protected] >>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture >>>> >>>> >>> >>> >>> -- >>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >>> Sinnathamby Mahesan >>> >>> >>> >>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >>> >>> >>> >>> -- >>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >>> Sinnathamby Mahesan >>> >>> >>> >>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >>> >> >> >> >> -- >> *Ashen Weerathunga* >> Software Engineer - Intern >> WSO2 Inc.: http://wso2.com >> lean.enterprise.middleware >> >> Email: [email protected] >> Mobile: +94 716042995 <94716042995> >> LinkedIn: >> *http://lk.linkedin.com/in/ashenweerathunga >> <http://lk.linkedin.com/in/ashenweerathunga>* >> > > > > -- > *Ashen Weerathunga* > Software Engineer - Intern > WSO2 Inc.: http://wso2.com > lean.enterprise.middleware > > Email: [email protected] > Mobile: +94 716042995 <94716042995> > LinkedIn: > *http://lk.linkedin.com/in/ashenweerathunga > <http://lk.linkedin.com/in/ashenweerathunga>* >
_______________________________________________ Architecture mailing list [email protected] https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
