[ 
https://issues.apache.org/jira/browse/SPARK-4038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14207925#comment-14207925
 ] 

Ashutosh Trivedi edited comment on SPARK-4038 at 11/17/14 2:18 AM:
-------------------------------------------------------------------

I think I am following the procedure. I opened a discussion on dev mailing list 
and [~mengxr] asked me to open this JIRA.  If you read the description-- this 
JIRA is to discuss about various Outlier/anomaly detection algorithms. I don't 
just 'care to code' in Spark. Since I am using spark for my projects, I found 
that there are no algorithms on Outliers and I think  it should have it and I 
can contribute. I am aware of one algorithm AVF (link attached).  

The questions raised are valid and we want community to discuss it. 

This algorithm deals with categorical data, It uses the simplest approach by 
calculating frequency of each attribute in the data set. Some of the people in 
community are already doing the review and I am working on it.

I did not find any other algorithm which work on categorical data to find 
outliers. If you are aware of any other algorithm which is well known please 
share with us.

  


was (Author: rusty):
I think I am following the procedure. I opened a discussion on dev mailing list 
and Xiangrui asked me to open this JIRA.  If you read the description this JIRA 
is to discuss about various Outlier/anomaly detection algorithms. I don't just 
'care to code' in Spark. Since I am using spark for my projects, I found that 
there are no algorithms on Outliers and I think  it should have algorithms for 
it. I am aware of one algorithm AVF (link attached).  

The questions raised are valid and we want community to discuss it. 

This algorithm deals with categorical data, It uses the simplest approach by 
calculating frequency of each attribute in the data set. Some of the people in 
community are already doing the review and I am working on it.

I did not find any other algorithm which work on categorical data to find 
outliers. If you are aware of any other algorithm which is well known please 
share with us.

  

> Outlier Detection Algorithm for MLlib
> -------------------------------------
>
>                 Key: SPARK-4038
>                 URL: https://issues.apache.org/jira/browse/SPARK-4038
>             Project: Spark
>          Issue Type: New Feature
>          Components: MLlib
>            Reporter: Ashutosh Trivedi
>            Priority: Minor
>
> The aim of this JIRA is to discuss about which parallel outlier detection 
> algorithms can be included in MLlib. 
> The one which I am familiar with is Attribute Value Frequency (AVF). It 
> scales linearly with the number of data points and attributes, and relies on 
> a single data scan. It is not distance based and well suited for categorical 
> data. In original paper  a parallel version is also given, which is not 
> complected to implement.  I am working on the implementation and soon submit 
> the initial code for review.
> Here is the Link for the paper
> http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=4410382
> As pointed out by Xiangrui in discussion 
> http://apache-spark-developers-list.1001551.n3.nabble.com/MLlib-Contributing-Algorithm-for-Outlier-Detection-td8880.html
> There are other algorithms also. Lets discuss about which will be more 
> general and easily paralleled.
>    



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to