[
https://issues.apache.org/jira/browse/MAHOUT-617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13008244#comment-13008244
]
Vipul Pandey commented on MAHOUT-617:
-------------------------------------
Robin,
The output that i'm getting is :
11 X Y Z
11 X Y Z
11 X Y Z
21 X Y
21 X Y
25 Y
33 X
That's the same output that you expect according to your test case :
assertEquals(
"[(Z,([X, Y, Z],11)), (Y,([Y],25), ([X, Y],21), ([X, Y, Z],11)),
(X,([X],33), ([X, Y],21), ([X, Y, Z],11))]",
But the output we expect is :
11 Z Y X
11 Z Y
11 Z X
11 Z
21 Y X
25 Y
33 X
I don't see the subsets ZY, XZ and Z in the output although they all have to be
frequent. Instead XYZ is reported 3 times (I assume that's once for each X Y
and Z) and XY is reported twice.
Am I missing something?
If not, then how do I get to the actual output?
> FPGrowth/PFPGrowth giving out wrong results.
> ---------------------------------------------
>
> Key: MAHOUT-617
> URL: https://issues.apache.org/jira/browse/MAHOUT-617
> Project: Mahout
> Issue Type: Bug
> Components: Frequent Itemset/Association Rule Mining
> Affects Versions: 0.4
> Environment: Mac OS X, Linux
> Reporter: Vipul Pandey
> Assignee: Robin Anil
> Labels: AssociationMining, FPGrowth, FrequentItemsets
> Attachments: XY, XYZ
>
>
> FPGrowth reports the support of itemsets individually - in that - if Item X
> appears "individually" 12 times and appears with item Y 10 times (a total of
> 22 times) AND item Y appears "individually" 4 times (a total of 14 times)
> then this is what the output will be (say for min-support 2)
> 12 X
> 10 XY
> 4 Y
> Instead of
> 22 X
> 10 XY
> 14 Y
> Also, because of this If the minimum support is 5 then the output will look
> like :
> 12 X
> 10 X Y
> Thus totally Ignoring Y
> if the minimum support is 11 then the output will look like
> 12 X
> again Ignoring Y
> if the minimum support is 13 then there will be NO output. even though all
> the way along Xs support was 22 and Y's was 14
> Even if we want to show just the maximal itemsets (although i would like to
> see ALL the frequent itemsets - maximal or not) this output is wrong as with
> a support of 13 we should still have seen X(22) and Y(14)
> Now Say you add XYZ 11 times
> for support 1 you'd see
> 12 X
> 10 X Y
> 11 X Y Z
> 4 Y
> And for support 11 you'd see
> 12 X
> 11 X Y Z
> Although I'd expect the output (for both s=1 & s=11) to be
> 33 X
> 25 Y
> 21 XY
> 11 Z
> 11 XZ
> 11 YZ
> 11 XYZ
> attached are the sample inputs:
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira