[
https://issues.apache.org/jira/browse/MAHOUT-629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
niu updated MAHOUT-629:
-----------------------
Attachment: FPGrowth.java
I have test the patch of MAHOUT-629 as the FPGrowth.java I attached ,but I find
it become slower than the original implementation in mahout 0.4.
The dataset I use is like this, there are so many items which more than 1000 in
one transaction.
> FP Growth performance improvement
> ---------------------------------
>
> Key: MAHOUT-629
> URL: https://issues.apache.org/jira/browse/MAHOUT-629
> Project: Mahout
> Issue Type: Improvement
> Components: Frequent Itemset/Association Rule Mining
> Affects Versions: 0.4, 0.5
> Reporter: Jaroslaw Odzga
> Assignee: Robin Anil
> Fix For: 0.6
>
> Attachments: FPGrowth.java, performance_improvement.txt
>
>
> Instead of calculating patterns ending with given attribute multiple times
> they can be calculated just once. Depending on for how many features patterns
> are generated, speedup can be huge. More feature included - greater speedup.
> For test data set (88162 real life 'basket' transactions), if all features
> were selected (i.e. we want to generate patterns for all items in
> transactions), patterns generation time dropped from 1h 15min to 8sec. For
> parallel fpgrowth, where the number of requested features is limited the
> speedup is not that dramatic, but still noticeable. Basically work done is
> always smaller than before the patch (as patterns for each item are
> calculated at most once).
> The improvement is a variation of base algorithm in situation where we want
> to generate patterns for only subset of items (let's call this subset A).
> Given that items are ordered by descending frequency it is enough to
> calculate only patterns ending on any item with frequency smaller or equal to
> the most frequent item in the subset A. The heaps for each item are
> initialized upfront and merged after processing every item.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira