Thanks for the suggestion! This diagram shows how the algorithm detect anomaly behaviors. As in the diagram when we do the K means clustering there will be set of clusters of normal data and some deviated points which behave as anomalies. since we consider a percentile distance to identify cluster boundaries we can eliminate those anomaly data from clusters. so when a new data point comes closest cluster center will be calculated and after that comparing distances we can identify whether it is belong to the cluster or not. If it is not algorithms detect it as a anomaly data.
[image: Inline image 3] Hope this will give a more clear view about the algorithm. Thanks, Ashen On Wed, Sep 23, 2015 at 6:11 PM, Nirmal Fernando <[email protected]> wrote: > Thanks Ashen! Few diagrams will help readers to understand the algorithm > better. > > On Wed, Sep 23, 2015 at 6:03 PM, Ashen Weerathunga <[email protected]> wrote: > >> Hi all, >> >> I am currently doing the integration of Anomaly detection feature to the >> WSO2 ML. There are some anomaly/fraud detection features already >> implemented in CEP/DAS using different approaches. But this will be done >> using a machine learning approach which is K means clustering. Basically I >> have used K means algorithm provided by Apache Spark MLib which is already >> using in WSO2 ML. >> >> This feature supports both labeled and unlabeled data. User can build a >> model using existing data and use that for prediction. >> >> The main steps of this feature are as follows, >> >> - After doing the preprocessing steps user will have to select the >> algorithm. There will be two algorithms under Anomaly Detection category >> - K Means with Unlabeled data >> - K Means with Labeled data - If user have labeled data user can >> go for this option >> - If user select K Means with labeled data option user should >> input Normal label(s) values and train data fraction as well. >> - In the next step user will have to input three parameters >> - Maximum number of iterations >> - Number of normal clusters >> - Percentile value >> - Then the model will be build using those parameters >> - A model summery will be provided for labeled data option which >> shows the model accuracy measures,confusion matrix, etc. >> - In the prediction part user will have two options as to input new >> data as a csv or tsv file or manually enter new data values. As the >> prediction it will show whether the new data point is an anomaly or not. >> >> The methodology used is as follows, >> >> - First the dataset will be clustered using K means algorithm >> according to hyper parameters that user provided. >> - Since in the real world scenario of anomaly detection the >> positive(anomaly) instances are vary rare, we assume that those anomalies >> will be in outside from the clusters. >> - So we can detect them by calculating the cluster boundaries. This >> is how we identify the cluster boundaries, >> - First calculate all the distances between data points and their >> respective cluster centers. >> - Then select the percentile value from distances of each clusters >> as their cluster boundaries. >> - When a new data point comes the closest cluster center will be >> calculated by K means predict function. >> - Then the distance between new data point and It's cluster center >> will be calculated. If it is less than the percentile distance value it is >> considered as a normal data. If it is grater than the percentile distance >> value it is considered as a anomaly since it is in outside the cluster. >> >> Most of the work have completed by now. Please let me know if there are >> any issues or improvements to be done. >> https://github.com/ashensw/carbon-ml/tree/fraud_detection >> >> Thanks and Regards, >> Ashen >> >> -- >> *Ashen Weerathunga* >> Software Engineer - Intern >> WSO2 Inc.: http://wso2.com >> lean.enterprise.middleware >> >> Email: [email protected] >> Mobile: +94 716042995 <94716042995> >> LinkedIn: >> *http://lk.linkedin.com/in/ashenweerathunga >> <http://lk.linkedin.com/in/ashenweerathunga>* >> > > > > -- > > Thanks & regards, > Nirmal > > Team Lead - WSO2 Machine Learner > Associate Technical Lead - Data Technologies Team, WSO2 Inc. > Mobile: +94715779733 > Blog: http://nirmalfdo.blogspot.com/ > > > -- *Ashen Weerathunga* Software Engineer - Intern WSO2 Inc.: http://wso2.com lean.enterprise.middleware Email: [email protected] Mobile: +94 716042995 <94716042995> LinkedIn: *http://lk.linkedin.com/in/ashenweerathunga <http://lk.linkedin.com/in/ashenweerathunga>*
_______________________________________________ Architecture mailing list [email protected] https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
