Hi Xiangrui, Thanks for the reply. AVF is not so difficult to implement in parallel. It just calculate the frequency of each attribute and calculate the overall 'score' of the datapoint. Low score points are considered outlier. One advantage of it is that it does not calculate distance, so in that sense it is general.
I have to look at the one you pointed out. It calculates Hat matrix and I am not sure about calculating Hat matrix in parallel, but Mahalanobis Distance can be implemented. http://en.wikipedia.org/wiki/Mahalanobis_distance I have Opened the JIRA. https://issues.apache.org/jira/browse/SPARK-4038 Lets discuss it over there. Regards, Ashutosh -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/MLlib-Contributing-Algorithm-for-Outlier-Detection-tp8880p8894.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org