[ 
https://issues.apache.org/jira/browse/SPARK-4001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14177707#comment-14177707
 ] 

Sean Owen commented on SPARK-4001:
----------------------------------

FWIW I do perceive Apriori to be *the* basic frequent itemset algorithm. I 
think this is the original paper -- at least it was on Wikipedia and looks like 
the right time / author: 
http://rakesh.agrawal-family.com/papers/vldb94apriori.pdf  It is very simple, 
and probably what you'd cook up if you invented a solution to the problem: 
http://en.wikipedia.org/wiki/Apriori_algorithm

Frequent itemset is not quite the same as a frequent item algorithm. From a 
bunch of sets of items, it tries to determine which subsets occur frequently.

FP-Growth is the other itemset algorithm I have ever heard of. It's more 
sophisticated. I don't have a paper reference.

If you're going to implement frequent itemsets, I think these are the two to 
start with. That said I perceive frequent itemsets to be kind of "90s" and I 
have never had to use it myself. That is not to say they don't have use, and 
hey they're simple. I suppose my problem with this type of technique is that 
it's not really telling you whether the set occurred unusually frequently, just 
that it did in absolute terms. There is not a probabilistic element to these.


> Add Apriori algorithm to Spark MLlib
> ------------------------------------
>
>                 Key: SPARK-4001
>                 URL: https://issues.apache.org/jira/browse/SPARK-4001
>             Project: Spark
>          Issue Type: New Feature
>          Components: MLlib
>            Reporter: Jacky Li
>            Assignee: Jacky Li
>
> Apriori is the classic algorithm for frequent item set mining in a 
> transactional data set.  It will be useful if Apriori algorithm is added to 
> MLLib in Spark



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to