[
https://issues.apache.org/jira/browse/SPARK-6143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hyukjin Kwon updated SPARK-6143:
--------------------------------
Labels: bulk-closed (was: )
> Improve FP-Growth for mining closed-forms of frequent patterns
> --------------------------------------------------------------
>
> Key: SPARK-6143
> URL: https://issues.apache.org/jira/browse/SPARK-6143
> Project: Spark
> Issue Type: Improvement
> Components: MLlib
> Reporter: Denis Dus
> Priority: Minor
> Labels: bulk-closed
>
> It is more convenient for person to analyze closed forms of frequent itemsets
> (and patterns in general).
> An itemset X is closed in data set X if there exist no proper super-itemset Y
> such that Y has same support as X in D. So, closed frequent itemsets is just
> lossless compression of all frequent itemsets.
> 1) A naive approach is to find all frequent itemsets and then remove each of
> them which is a proper subset of existing frequent itemset and has the same
> support. But it can be very costly as generation of all frequent itemsets is
> still needed.
> 2) The more powerful idea is to use some kind of merging while mining
> process. I've heard about FPClose algorithm based on FPGrowth:
> [http://users.encs.concordia.ca/~grahne/papers/fimi03.pdf] (Section 4 in
> paper)
> I think, that it can be more useful for MLLib users if they are interested in
> frequent itemsets analysis.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]