Hi all, Could anyone provide pointers on how to extend the SPARK FPGrowth implementation with either of the following stopping criteria:
* maximum number of generated itemsets, * maximum length of generated itemsets (i.e. number of items in itemset). The second criterion is e.g. available in the Christian Borgelt's FPG implementation [1] through the -n# switch. We have experience with apriori but not parallel FP-Growth, so any guidance will be welcome. The reason why we need this - without these extra constraints we keep running into combinatorial explosion problems, as documented on the UCI Audiology dataset [2]. Thanks, -- Tomas [1] http://www.borgelt.net/doc/fpgrowth/fpgrowth.html [2] https://issues.apache.org/jira/browse/SPARK-12163 --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org