Denis Dus created SPARK-6143:
--------------------------------
Summary: Improve FP-Growth for mining closed-forms of frequent
patterns
Key: SPARK-6143
URL: https://issues.apache.org/jira/browse/SPARK-6143
Project: Spark
Issue Type: Improvement
Components: MLlib
Reporter: Denis Dus
Priority: Minor
It is more convenient for person to analyze closed forms of frequent itemsets
(and patterns in general).
An itemset X is closed in data set X if there exist no proper super-itemset Y
such that Y has same support as X in D. So, closed frequent itemsets is just
lossless compression of all frequent itemsets.
A naive approach is to find all frequent itemsets and then remove each of them
which is a proper subset of existing frequent itemset and has the same support.
But it can be very costly as generation of all frequent itemsets is still
needed.
The more powerful idea is to use some kind of merging while mining process.
I've heard about FPClose algorithm based on FPGrowth:
[http://users.encs.concordia.ca/~grahne/papers/fimi03.pdf] (Section 4 in paper)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]