Hi all,

Could anyone provide pointers on how to  extend the SPARK FPGrowth
implementation with either of the following stopping criteria:

* maximum number of generated itemsets,
* maximum length of generated itemsets (i.e. number of items in itemset).

The second criterion is e.g. available in the Christian Borgelt's FPG
implementation [1] through the -n# switch.

We have experience with apriori but not parallel FP-Growth, so any
guidance will be welcome.

The reason why we need this -  without these extra constraints we keep
running into combinatorial explosion problems,  as documented on the
UCI Audiology dataset [2].

Thanks,
-- 
Tomas

[1] http://www.borgelt.net/doc/fpgrowth/fpgrowth.html
[2] https://issues.apache.org/jira/browse/SPARK-12163

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Reply via email to