Dear Ashen I know you have programmed correctly, but here too it is better to show that
if (ri > di ) for all i=1..k => Anomalous where k is the number of clusters di is the distance between the point under consideration and the cluster centre i and ri is the percentile radius of cluster i [image: Inline images 2] :-) Best Wishes On 24 September 2015 at 11:43, Ashen Weerathunga <[email protected]> wrote: > Variables of the above diagram. > > - Cc1, Cc2, Cc3 - Cluster centers > > > - r1 - ith percentile distance of distances of all the points of > cluster 1 to their cluster center (Cc1) > (this is considered as the boundary of cluster 1) > > > - d1 - distance between particular data point and it's closest cluster > center (Cc1) > > > On Thu, Sep 24, 2015 at 11:25 AM, Ashen Weerathunga <[email protected]> > wrote: > >> Thanks for the suggestion! >> >> This diagram shows how the algorithm detect anomaly behaviors. As in the >> diagram when we do the K means clustering there will be set of clusters of >> normal data and some deviated points which behave as anomalies. since we >> consider a percentile distance to identify cluster boundaries we can >> eliminate those anomaly data from clusters. so when a new data point comes >> closest cluster center will be calculated and after that comparing >> distances we can identify whether it is belong to the cluster or not. If it >> is not algorithms detect it as a anomaly data. >> >> [image: Inline image 3] >> Hope this will give a more clear view about the algorithm. >> >> Thanks, >> Ashen >> >> On Wed, Sep 23, 2015 at 6:11 PM, Nirmal Fernando <[email protected]> wrote: >> >>> Thanks Ashen! Few diagrams will help readers to understand the algorithm >>> better. >>> >>> On Wed, Sep 23, 2015 at 6:03 PM, Ashen Weerathunga <[email protected]> >>> wrote: >>> >>>> Hi all, >>>> >>>> I am currently doing the integration of Anomaly detection feature to >>>> the WSO2 ML. There are some anomaly/fraud detection features already >>>> implemented in CEP/DAS using different approaches. But this will be done >>>> using a machine learning approach which is K means clustering. Basically I >>>> have used K means algorithm provided by Apache Spark MLib which is already >>>> using in WSO2 ML. >>>> >>>> This feature supports both labeled and unlabeled data. User can build a >>>> model using existing data and use that for prediction. >>>> >>>> The main steps of this feature are as follows, >>>> >>>> - After doing the preprocessing steps user will have to select the >>>> algorithm. There will be two algorithms under Anomaly Detection category >>>> - K Means with Unlabeled data >>>> - K Means with Labeled data - If user have labeled data user can >>>> go for this option >>>> - If user select K Means with labeled data option user should >>>> input Normal label(s) values and train data fraction as well. >>>> - In the next step user will have to input three parameters >>>> - Maximum number of iterations >>>> - Number of normal clusters >>>> - Percentile value >>>> - Then the model will be build using those parameters >>>> - A model summery will be provided for labeled data option which >>>> shows the model accuracy measures,confusion matrix, etc. >>>> - In the prediction part user will have two options as to input new >>>> data as a csv or tsv file or manually enter new data values. As the >>>> prediction it will show whether the new data point is an anomaly or not. >>>> >>>> The methodology used is as follows, >>>> >>>> - First the dataset will be clustered using K means algorithm >>>> according to hyper parameters that user provided. >>>> - Since in the real world scenario of anomaly detection the >>>> positive(anomaly) instances are vary rare, we assume that those >>>> anomalies >>>> will be in outside from the clusters. >>>> - So we can detect them by calculating the cluster boundaries. This >>>> is how we identify the cluster boundaries, >>>> - First calculate all the distances between data points and >>>> their respective cluster centers. >>>> - Then select the percentile value from distances of each >>>> clusters as their cluster boundaries. >>>> - When a new data point comes the closest cluster center will be >>>> calculated by K means predict function. >>>> - Then the distance between new data point and It's cluster center >>>> will be calculated. If it is less than the percentile distance value it >>>> is >>>> considered as a normal data. If it is grater than the percentile >>>> distance >>>> value it is considered as a anomaly since it is in outside the cluster. >>>> >>>> Most of the work have completed by now. Please let me know if there are >>>> any issues or improvements to be done. >>>> https://github.com/ashensw/carbon-ml/tree/fraud_detection >>>> >>>> Thanks and Regards, >>>> Ashen >>>> >>>> -- >>>> *Ashen Weerathunga* >>>> Software Engineer - Intern >>>> WSO2 Inc.: http://wso2.com >>>> lean.enterprise.middleware >>>> >>>> Email: [email protected] >>>> Mobile: +94 716042995 <94716042995> >>>> LinkedIn: >>>> *http://lk.linkedin.com/in/ashenweerathunga >>>> <http://lk.linkedin.com/in/ashenweerathunga>* >>>> >>> >>> >>> >>> -- >>> >>> Thanks & regards, >>> Nirmal >>> >>> Team Lead - WSO2 Machine Learner >>> Associate Technical Lead - Data Technologies Team, WSO2 Inc. >>> Mobile: +94715779733 >>> Blog: http://nirmalfdo.blogspot.com/ >>> >>> >>> >> >> >> -- >> *Ashen Weerathunga* >> Software Engineer - Intern >> WSO2 Inc.: http://wso2.com >> lean.enterprise.middleware >> >> Email: [email protected] >> Mobile: +94 716042995 <94716042995> >> LinkedIn: >> *http://lk.linkedin.com/in/ashenweerathunga >> <http://lk.linkedin.com/in/ashenweerathunga>* >> > > > > -- > *Ashen Weerathunga* > Software Engineer - Intern > WSO2 Inc.: http://wso2.com > lean.enterprise.middleware > > Email: [email protected] > Mobile: +94 716042995 <94716042995> > LinkedIn: > *http://lk.linkedin.com/in/ashenweerathunga > <http://lk.linkedin.com/in/ashenweerathunga>* > > _______________________________________________ > Architecture mailing list > [email protected] > https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture > > -- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Sinnathamby Mahesan ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
_______________________________________________ Architecture mailing list [email protected] https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
