[
https://issues.apache.org/jira/browse/SPARK-4001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14177658#comment-14177658
]
Xiangrui Meng commented on SPARK-4001:
--------------------------------------
I'm asking because I'm not very familiar with this algorithm. I need some
references to understand the following:
1) how important/popular the algorithm is (paper, use cases)
2) whether it is straight-forward to implement it in parallel
3) what is the storage and computation complexity
4) whether there are alternatives and how they compare
For example, for the one I mentioned (A simple algorithm for finding frequent
elements in streams and bags):
1) it has 400 references on google scholar / it finds frequent items that
appears above a threshold in two passes
2) it is very easy to implement in parallel
3) storage is O(1/p) and two-passes, where p is threshold on the frequency
> Add Apriori algorithm to Spark MLlib
> ------------------------------------
>
> Key: SPARK-4001
> URL: https://issues.apache.org/jira/browse/SPARK-4001
> Project: Spark
> Issue Type: New Feature
> Components: MLlib
> Reporter: Jacky Li
> Assignee: Jacky Li
>
> Apriori is the classic algorithm for frequent item set mining in a
> transactional data set. It will be useful if Apriori algorithm is added to
> MLLib in Spark
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]