Dear Ashen
I know you  have programmed correctly,

but here too
it is better to show that

if   (ri > di ) for all i=1..k  => Anomalous

where k is the number of clusters
di is the distance between the point under consideration and the cluster
centre i
and
ri is the percentile radius of cluster i


[image: Inline images 2]

:-)
Best Wishes




On 24 September 2015 at 11:43, Ashen Weerathunga <[email protected]> wrote:

> Variables of the above diagram.
>
>    - Cc1, Cc2, Cc3 - Cluster centers
>
>
>    - r1 - ith percentile distance of distances of all the points of
>    cluster 1 to their cluster center (Cc1)
>    (this is considered as the boundary of cluster 1)
>
>
>    - d1 - distance between particular data point and it's closest cluster
>    center (Cc1)
>
>
> On Thu, Sep 24, 2015 at 11:25 AM, Ashen Weerathunga <[email protected]>
> wrote:
>
>> Thanks for the suggestion!
>>
>> This diagram shows how the algorithm detect anomaly behaviors. As in the
>> diagram when we do the K means clustering there will be set of clusters of
>> normal data and some deviated points which behave as anomalies. since we
>> consider a percentile distance to identify cluster boundaries we can
>> eliminate those anomaly data from clusters. so when a new data point comes
>> closest cluster center will be calculated and after that comparing
>> distances we can identify whether it is belong to the cluster or not. If it
>> is not algorithms detect it as a anomaly data.
>>
>> [image: Inline image 3]
>> Hope this will give a more clear view about the algorithm.
>>
>> Thanks,
>> Ashen
>>
>> On Wed, Sep 23, 2015 at 6:11 PM, Nirmal Fernando <[email protected]> wrote:
>>
>>> Thanks Ashen! Few diagrams will help readers to understand the algorithm
>>> better.
>>>
>>> On Wed, Sep 23, 2015 at 6:03 PM, Ashen Weerathunga <[email protected]>
>>> wrote:
>>>
>>>> Hi all,
>>>>
>>>> I am currently doing the integration of Anomaly detection feature to
>>>> the WSO2 ML. There are some anomaly/fraud detection features already
>>>> implemented in CEP/DAS using different approaches. But this will be done
>>>> using a machine learning approach which is K means clustering. Basically I
>>>> have used K means algorithm provided by Apache Spark MLib which is already
>>>> using in WSO2 ML.
>>>>
>>>> This feature supports both labeled and unlabeled data. User can build a
>>>> model using existing data and use that for prediction.
>>>>
>>>> The main steps of this feature are as follows,
>>>>
>>>>    - After doing the preprocessing steps user will have to select the
>>>>    algorithm. There will be two algorithms under Anomaly Detection category
>>>>       - K Means with Unlabeled data
>>>>       - K Means with Labeled data - If user have labeled data user can
>>>>       go for this option
>>>>       - If user select K Means with labeled data option user should
>>>>    input Normal label(s) values and train data fraction as well.
>>>>    - In the next step user will have to input three parameters
>>>>       - Maximum number of iterations
>>>>       - Number of normal clusters
>>>>       - Percentile value
>>>>       - Then the model will be build using those parameters
>>>>    - A model summery will be provided for labeled data option which
>>>>    shows the model accuracy measures,confusion matrix, etc.
>>>>    - In the prediction part user will have two options as to input new
>>>>    data as a csv or tsv file or manually enter new data values. As the
>>>>    prediction it will show whether the new data point is an anomaly or not.
>>>>
>>>> The methodology used is as follows,
>>>>
>>>>    - First the dataset will be clustered using K means algorithm
>>>>    according to hyper parameters that user provided.
>>>>    - Since in the real world scenario of anomaly detection the
>>>>    positive(anomaly) instances are vary rare, we assume that those 
>>>> anomalies
>>>>    will be in outside from the clusters.
>>>>    - So we can detect them by calculating the cluster boundaries. This
>>>>    is how we identify the cluster boundaries,
>>>>       - First calculate all the distances between data points and
>>>>       their respective cluster centers.
>>>>       - Then select the percentile value from distances of each
>>>>       clusters as their cluster boundaries.
>>>>    - When a new data point comes the closest cluster center will be
>>>>    calculated by K means predict function.
>>>>    - Then the distance between new data point and It's cluster center
>>>>    will be calculated. If it is less than the percentile distance value it 
>>>> is
>>>>    considered as a normal data. If it is grater than the percentile 
>>>> distance
>>>>    value it is considered as a anomaly since it is in outside the cluster.
>>>>
>>>> Most of the work have completed by now. Please let me know if there are
>>>> any issues or improvements to be done.
>>>> https://github.com/ashensw/carbon-ml/tree/fraud_detection
>>>>
>>>> Thanks and Regards,
>>>> Ashen
>>>>
>>>> --
>>>> *Ashen Weerathunga*
>>>> Software Engineer - Intern
>>>> WSO2 Inc.: http://wso2.com
>>>> lean.enterprise.middleware
>>>>
>>>> Email: [email protected]
>>>> Mobile: +94 716042995 <94716042995>
>>>> LinkedIn:
>>>> *http://lk.linkedin.com/in/ashenweerathunga
>>>> <http://lk.linkedin.com/in/ashenweerathunga>*
>>>>
>>>
>>>
>>>
>>> --
>>>
>>> Thanks & regards,
>>> Nirmal
>>>
>>> Team Lead - WSO2 Machine Learner
>>> Associate Technical Lead - Data Technologies Team, WSO2 Inc.
>>> Mobile: +94715779733
>>> Blog: http://nirmalfdo.blogspot.com/
>>>
>>>
>>>
>>
>>
>> --
>> *Ashen Weerathunga*
>> Software Engineer - Intern
>> WSO2 Inc.: http://wso2.com
>> lean.enterprise.middleware
>>
>> Email: [email protected]
>> Mobile: +94 716042995 <94716042995>
>> LinkedIn:
>> *http://lk.linkedin.com/in/ashenweerathunga
>> <http://lk.linkedin.com/in/ashenweerathunga>*
>>
>
>
>
> --
> *Ashen Weerathunga*
> Software Engineer - Intern
> WSO2 Inc.: http://wso2.com
> lean.enterprise.middleware
>
> Email: [email protected]
> Mobile: +94 716042995 <94716042995>
> LinkedIn:
> *http://lk.linkedin.com/in/ashenweerathunga
> <http://lk.linkedin.com/in/ashenweerathunga>*
>
> _______________________________________________
> Architecture mailing list
> [email protected]
> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>
>


-- 
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Sinnathamby Mahesan



~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
_______________________________________________
Architecture mailing list
[email protected]
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture

Reply via email to