The Apriori algorithm has (I think) really bad scaling properties. As such I would suggest that the PFGrowth work by Robin pretty much takes the place of Mahout-108.
On Sun, Sep 13, 2009 at 6:29 AM, Isabel Drost (JIRA) <[email protected]>wrote: > > [ > https://issues.apache.org/jira/browse/MAHOUT-108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12754694#action_12754694] > > Isabel Drost commented on MAHOUT-108: > ------------------------------------- > > Contacted (at least tried to) Chao Deng asking for the status and if I > could help him submit the patch. Should we close this issue as won't fix or > defer it to a later version if he does not respond? Or is anyone else up to > implementing a patch for this task until 0.2? > > > Implementation of Assoication Rules learning by Apriori algorithm > > ----------------------------------------------------------------- > > > > Key: MAHOUT-108 > > URL: https://issues.apache.org/jira/browse/MAHOUT-108 > > Project: Mahout > > Issue Type: Task > > Environment: Linux, Hadoop-0.17.1 > > Reporter: chao deng > > Fix For: 0.2 > > > > Original Estimate: 504h > > Remaining Estimate: 504h > > > > Target: Association Rules learning is a popular method for discovering > interesting relations between variables in large databases. Here, we would > implement the Apriori algorithm using Hadoop&Mapreduce parallel techniques. > > Applications: Typically, association rules learning is used to discover > regularities between products in large scale transaction data in > supermarkets. For example, the rule "{onions, patatoes}->beef" found in the > sales data would indicate that if a customer buys onions and potatoes > together, he or she is likely to also buy beef. Such information can be used > as the basis for decisions about marketing activities. In addition to the > market basket analysis, association rules are employed today in many > application areas including Web usage mining, intrusion detection and > bioinformatics. > > Apriori algorithm: Apriori is the best-known algorithm to mine > association rules. It uses a breadth-first search strategy to counting the > support of itemsets and uses a candidate generation function which exploits > the downward closure property of support > > -- > This message is automatically generated by JIRA. > - > You can reply to this email to add a comment to the issue online. > > -- Ted Dunning, CTO DeepDyve
