Hi Xiangrui,

Thanks for the reply. AVF is not so difficult to implement in parallel. It
just calculate the frequency of each attribute and calculate the overall
'score' of the datapoint. Low score points are considered outlier. One
advantage of it is that it does not calculate distance, so in that sense it
is general.

I have to look at the one you pointed out. It calculates Hat matrix and I am
not sure about calculating Hat matrix in parallel, but Mahalanobis Distance
can be implemented. http://en.wikipedia.org/wiki/Mahalanobis_distance 

I have Opened the JIRA.
 https://issues.apache.org/jira/browse/SPARK-4038
Lets discuss it over there.

Regards,
Ashutosh



--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/MLlib-Contributing-Algorithm-for-Outlier-Detection-tp8880p8894.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Reply via email to