[ 
https://issues.apache.org/jira/browse/SPARK-8540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14633601#comment-14633601
 ] 

Rakesh Chalasani commented on SPARK-8540:
-----------------------------------------

I think clubbing an algorithm with a specific use case might not be a good 
idea, in this case KMeans with anomaly detection. Why not just return the 
distances to KMean centers and then the user can write a simple operations over 
that column to get the anomalies? If we return distances, finding anomalies 
will be just one more line of code and we can have an example showing that.  

> KMeans-based outlier detection
> ------------------------------
>
>                 Key: SPARK-8540
>                 URL: https://issues.apache.org/jira/browse/SPARK-8540
>             Project: Spark
>          Issue Type: Sub-task
>          Components: ML
>            Reporter: Joseph K. Bradley
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> Proposal for K-Means-based outlier detection:
> * Cluster data using K-Means
> * Provide prediction/filtering functionality which returns outliers/anomalies
> ** This can take some threshold parameter which specifies either (a) how far 
> off a point needs to be to be considered an outlier or (b) how many outliers 
> should be returned.
> Note this will require a bit of API design, which should probably be posted 
> and discussed on this JIRA before implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to