[ 
https://issues.apache.org/jira/browse/SPARK-6143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-6143:
--------------------------------
    Labels: bulk-closed  (was: )

> Improve FP-Growth for mining closed-forms of frequent patterns
> --------------------------------------------------------------
>
>                 Key: SPARK-6143
>                 URL: https://issues.apache.org/jira/browse/SPARK-6143
>             Project: Spark
>          Issue Type: Improvement
>          Components: MLlib
>            Reporter: Denis Dus
>            Priority: Minor
>              Labels: bulk-closed
>
> It is more convenient for person to analyze closed forms of frequent itemsets 
> (and patterns in general).
> An itemset X is closed in data set X if there exist no proper super-itemset Y 
> such that Y has same support as X in D. So, closed frequent itemsets is just 
> lossless compression of all frequent itemsets.
> 1) A naive approach is to find all frequent itemsets and then remove each of 
> them which is a proper subset of existing frequent itemset and has the same 
> support. But it can be very costly as generation of all frequent itemsets is 
> still needed.
> 2) The more powerful idea is to use some kind of merging while mining 
> process. I've heard about FPClose algorithm based on FPGrowth:
> [http://users.encs.concordia.ca/~grahne/papers/fimi03.pdf] (Section 4 in 
> paper) 
> I think, that it can be more useful for MLLib users if they are interested in 
> frequent itemsets analysis.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to