Can we write an article? On Thu, Nov 19, 2015 at 10:43 AM, Seshika Fernando <[email protected]> wrote:
> Welldone Ashen. The documentation looks good too. Will review later. > > seshi > > On Thu, Nov 19, 2015 at 10:35 AM, Ashen Weerathunga <[email protected]> > wrote: > >> Hi all, >> >> This feature was implemented on ML and released with WSO2 Machine >> Learner 1.1.0 - Milestone 1 >> <https://github.com/wso2/product-ml/releases/tag/v1.1.0-m1>. Thanks >> everyone for your ideas and support. Please find the attachments. >> >> [1] PR - carbon-ml >> [2] PR - product-ml >> >> [1] https://github.com/wso2/carbon-ml/pull/138 >> [2] https://github.com/wso2/product-ml/pull/263 >> >> Thanks and Regards, >> Ashen >> >> On Mon, Sep 28, 2015 at 11:12 PM, Ashen Weerathunga <[email protected]> >> wrote: >> >>> Sure, thanks Mahesan! >>> >>> On Mon, Sep 28, 2015 at 9:51 AM, Sinnathamby Mahesan < >>> [email protected]> wrote: >>> >>> >>>> ---------- Forwarded message ---------- >>>> From: Sinnathamby Mahesan <[email protected]> >>>> Date: 28 September 2015 at 09:50 >>>> Subject: Re: [Architecture] [ML] Anomaly Detection Feature for WSO2 ML >>>> To: [email protected] >>>> Cc: Nirmal Fernando <[email protected]> >>>> >>>> >>>> Dear Ashen >>>> I know you have programmed correctly, >>>> >>>> but here too >>>> it is better to show that >>>> >>>> if (ri > di ) for all i=1..k => Anomalous >>>> >>>> where k is the number of clusters >>>> di is the distance between the point under consideration and the >>>> cluster centre i >>>> and >>>> ri is the percentile radius of cluster i >>>> >>>> >>>> [image: Inline images 2] >>>> >>>> :-) >>>> Best Wishes >>>> >>>> >>>> >>>> >>>> On 24 September 2015 at 11:43, Ashen Weerathunga <[email protected]> >>>> wrote: >>>> >>>>> Variables of the above diagram. >>>>> >>>>> - Cc1, Cc2, Cc3 - Cluster centers >>>>> >>>>> >>>>> - r1 - ith percentile distance of distances of all the points of >>>>> cluster 1 to their cluster center (Cc1) >>>>> (this is considered as the boundary of cluster 1) >>>>> >>>>> >>>>> - d1 - distance between particular data point and it's closest >>>>> cluster center (Cc1) >>>>> >>>>> >>>>> On Thu, Sep 24, 2015 at 11:25 AM, Ashen Weerathunga <[email protected]> >>>>> wrote: >>>>> >>>>>> Thanks for the suggestion! >>>>>> >>>>>> This diagram shows how the algorithm detect anomaly behaviors. As in >>>>>> the diagram when we do the K means clustering there will be set of >>>>>> clusters >>>>>> of normal data and some deviated points which behave as anomalies. since >>>>>> we >>>>>> consider a percentile distance to identify cluster boundaries we can >>>>>> eliminate those anomaly data from clusters. so when a new data point >>>>>> comes >>>>>> closest cluster center will be calculated and after that comparing >>>>>> distances we can identify whether it is belong to the cluster or not. If >>>>>> it >>>>>> is not algorithms detect it as a anomaly data. >>>>>> >>>>>> [image: Inline image 3] >>>>>> Hope this will give a more clear view about the algorithm. >>>>>> >>>>>> Thanks, >>>>>> Ashen >>>>>> >>>>>> On Wed, Sep 23, 2015 at 6:11 PM, Nirmal Fernando <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> Thanks Ashen! Few diagrams will help readers to understand the >>>>>>> algorithm better. >>>>>>> >>>>>>> On Wed, Sep 23, 2015 at 6:03 PM, Ashen Weerathunga <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> Hi all, >>>>>>>> >>>>>>>> I am currently doing the integration of Anomaly detection feature >>>>>>>> to the WSO2 ML. There are some anomaly/fraud detection features already >>>>>>>> implemented in CEP/DAS using different approaches. But this will be >>>>>>>> done >>>>>>>> using a machine learning approach which is K means clustering. >>>>>>>> Basically I >>>>>>>> have used K means algorithm provided by Apache Spark MLib which is >>>>>>>> already >>>>>>>> using in WSO2 ML. >>>>>>>> >>>>>>>> This feature supports both labeled and unlabeled data. User can >>>>>>>> build a model using existing data and use that for prediction. >>>>>>>> >>>>>>>> The main steps of this feature are as follows, >>>>>>>> >>>>>>>> - After doing the preprocessing steps user will have to select >>>>>>>> the algorithm. There will be two algorithms under Anomaly Detection >>>>>>>> category >>>>>>>> - K Means with Unlabeled data >>>>>>>> - K Means with Labeled data - If user have labeled data user >>>>>>>> can go for this option >>>>>>>> - If user select K Means with labeled data option user >>>>>>>> should input Normal label(s) values and train data fraction as well. >>>>>>>> - In the next step user will have to input three parameters >>>>>>>> - Maximum number of iterations >>>>>>>> - Number of normal clusters >>>>>>>> - Percentile value >>>>>>>> - Then the model will be build using those parameters >>>>>>>> - A model summery will be provided for labeled data option >>>>>>>> which shows the model accuracy measures,confusion matrix, etc. >>>>>>>> - In the prediction part user will have two options as to input >>>>>>>> new data as a csv or tsv file or manually enter new data values. As >>>>>>>> the >>>>>>>> prediction it will show whether the new data point is an anomaly or >>>>>>>> not. >>>>>>>> >>>>>>>> The methodology used is as follows, >>>>>>>> >>>>>>>> - First the dataset will be clustered using K means algorithm >>>>>>>> according to hyper parameters that user provided. >>>>>>>> - Since in the real world scenario of anomaly detection the >>>>>>>> positive(anomaly) instances are vary rare, we assume that those >>>>>>>> anomalies >>>>>>>> will be in outside from the clusters. >>>>>>>> - So we can detect them by calculating the cluster boundaries. >>>>>>>> This is how we identify the cluster boundaries, >>>>>>>> - First calculate all the distances between data points and >>>>>>>> their respective cluster centers. >>>>>>>> - Then select the percentile value from distances of each >>>>>>>> clusters as their cluster boundaries. >>>>>>>> - When a new data point comes the closest cluster center will >>>>>>>> be calculated by K means predict function. >>>>>>>> - Then the distance between new data point and It's cluster >>>>>>>> center will be calculated. If it is less than the percentile >>>>>>>> distance value >>>>>>>> it is considered as a normal data. If it is grater than the >>>>>>>> percentile >>>>>>>> distance value it is considered as a anomaly since it is in outside >>>>>>>> the >>>>>>>> cluster. >>>>>>>> >>>>>>>> Most of the work have completed by now. Please let me know if there >>>>>>>> are any issues or improvements to be done. >>>>>>>> https://github.com/ashensw/carbon-ml/tree/fraud_detection >>>>>>>> >>>>>>>> Thanks and Regards, >>>>>>>> Ashen >>>>>>>> >>>>>>>> -- >>>>>>>> *Ashen Weerathunga* >>>>>>>> Software Engineer - Intern >>>>>>>> WSO2 Inc.: http://wso2.com >>>>>>>> lean.enterprise.middleware >>>>>>>> >>>>>>>> Email: [email protected] >>>>>>>> Mobile: +94 716042995 <94716042995> >>>>>>>> LinkedIn: >>>>>>>> *http://lk.linkedin.com/in/ashenweerathunga >>>>>>>> <http://lk.linkedin.com/in/ashenweerathunga>* >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> >>>>>>> Thanks & regards, >>>>>>> Nirmal >>>>>>> >>>>>>> Team Lead - WSO2 Machine Learner >>>>>>> Associate Technical Lead - Data Technologies Team, WSO2 Inc. >>>>>>> Mobile: +94715779733 >>>>>>> Blog: http://nirmalfdo.blogspot.com/ >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> *Ashen Weerathunga* >>>>>> Software Engineer - Intern >>>>>> WSO2 Inc.: http://wso2.com >>>>>> lean.enterprise.middleware >>>>>> >>>>>> Email: [email protected] >>>>>> Mobile: +94 716042995 <94716042995> >>>>>> LinkedIn: >>>>>> *http://lk.linkedin.com/in/ashenweerathunga >>>>>> <http://lk.linkedin.com/in/ashenweerathunga>* >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> *Ashen Weerathunga* >>>>> Software Engineer - Intern >>>>> WSO2 Inc.: http://wso2.com >>>>> lean.enterprise.middleware >>>>> >>>>> Email: [email protected] >>>>> Mobile: +94 716042995 <94716042995> >>>>> LinkedIn: >>>>> *http://lk.linkedin.com/in/ashenweerathunga >>>>> <http://lk.linkedin.com/in/ashenweerathunga>* >>>>> >>>>> _______________________________________________ >>>>> Architecture mailing list >>>>> [email protected] >>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture >>>>> >>>>> >>>> >>>> >>>> -- >>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >>>> Sinnathamby Mahesan >>>> >>>> >>>> >>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >>>> >>>> >>>> >>>> -- >>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >>>> Sinnathamby Mahesan >>>> >>>> >>>> >>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >>>> >>> >>> >>> >>> -- >>> *Ashen Weerathunga* >>> Software Engineer - Intern >>> WSO2 Inc.: http://wso2.com >>> lean.enterprise.middleware >>> >>> Email: [email protected] >>> Mobile: +94 716042995 <94716042995> >>> LinkedIn: >>> *http://lk.linkedin.com/in/ashenweerathunga >>> <http://lk.linkedin.com/in/ashenweerathunga>* >>> >> >> >> >> -- >> *Ashen Weerathunga* >> Software Engineer - Intern >> WSO2 Inc.: http://wso2.com >> lean.enterprise.middleware >> >> Email: [email protected] >> Mobile: +94 716042995 <94716042995> >> LinkedIn: >> *http://lk.linkedin.com/in/ashenweerathunga >> <http://lk.linkedin.com/in/ashenweerathunga>* >> > > -- ============================ Blog: http://srinathsview.blogspot.com twitter:@srinath_perera Site: http://people.apache.org/~hemapani/ Photos: http://www.flickr.com/photos/hemapani/ Phone: 0772360902
_______________________________________________ Architecture mailing list [email protected] https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
