[ 
https://issues.apache.org/jira/browse/SPARK-4001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14177658#comment-14177658
 ] 

Xiangrui Meng commented on SPARK-4001:
--------------------------------------

I'm asking because I'm not very familiar with this algorithm. I need some 
references to understand the following:

1) how important/popular the algorithm is (paper, use cases)
2) whether it is straight-forward to implement it in parallel
3) what is the storage and computation complexity
4) whether there are alternatives and how they compare

For example, for the one I mentioned (A simple algorithm for finding frequent 
elements in streams and bags):

1) it has 400 references on google scholar / it finds frequent items that 
appears above a threshold in two passes
2) it is very easy to implement in parallel
3) storage is O(1/p) and two-passes, where p is threshold on the frequency

> Add Apriori algorithm to Spark MLlib
> ------------------------------------
>
>                 Key: SPARK-4001
>                 URL: https://issues.apache.org/jira/browse/SPARK-4001
>             Project: Spark
>          Issue Type: New Feature
>          Components: MLlib
>            Reporter: Jacky Li
>            Assignee: Jacky Li
>
> Apriori is the classic algorithm for frequent item set mining in a 
> transactional data set.  It will be useful if Apriori algorithm is added to 
> MLLib in Spark



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to