[ https://issues.apache.org/jira/browse/SPARK-4001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14226562#comment-14226562 ]
Jacky Li commented on SPARK-4001: --------------------------------- Thanks for your suggestion, Daniel. Here is the current status. 1. Currently I have implemented apriori and fp-growth by referring to YAFIM (http://pasa-bigdata.nju.edu.cn/people/ronggu/pub/YAFIM_ParLearning.pdf) and PFP (http://dl.acm.org/citation.cfm?id=1454027) For apriori, currently there are two versions implemented, one using broadcast variable and another one using cartisian join of two RDD, I am testing them using mushroom and webdoc open dataset (http://fimi.ua.ac.be/data/) to check the performance of them before deciding which one to contribute to MLlib. I have updated the code in the PR (https://github.com/apache/spark/pull/2847), you are welcome to check it and try in your use case. 2. For the input part, currently the apriori algo is taking {{RDD\[Array\[String\]\]}} as the input dataset, but not containing basket_id or user_id. I am not sure whether it can easily fit into your use case. Can you give more detail of how you want to use it in collaborative filtering contexts? > Add Apriori algorithm to Spark MLlib > ------------------------------------ > > Key: SPARK-4001 > URL: https://issues.apache.org/jira/browse/SPARK-4001 > Project: Spark > Issue Type: New Feature > Components: MLlib > Reporter: Jacky Li > Assignee: Jacky Li > > Apriori is the classic algorithm for frequent item set mining in a > transactional data set. It will be useful if Apriori algorithm is added to > MLLib in Spark -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org