[ 
https://issues.apache.org/jira/browse/MAHOUT-617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vipul Pandey updated MAHOUT-617:
--------------------------------

    Description: 
FPGrowth reports the support of itemsets individually - in that - if Item X 
appears "individually" 12 times and appears with item Y 10 times (a total of 22 
times) AND item Y appears "individually" 4 times (a total of 14 times) then 
this is what the output will be (say for min-support 2)

12 X
10 XY
4  Y

Instead of 
22 X
10 XY
14 Y

Also, because of this If the minimum support is 5 then the output will look 
like : 
12 X
10 X Y
Thus totally Ignoring Y

if the minimum support is 11 then the output will look like 
12 X
again Ignoring Y

if the minimum support is 13 then there will be NO output. even though all the 
way along Xs support was 22 and Y's was 14



Even if we want to show just the maximal itemsets (although i would like to see 
ALL the frequent itemsets - maximal or not) this output is wrong as with a 
support of 13 we should still have seen X(22) and Y(14)


Now Say you add XYZ 11 times


for support 1 you'd see
12 X
10 X Y
11 X Y Z
4   Y




And for support 11 you'd see
12 X
11 X Y Z

Although I'd expect the output (for both s=1 & s=11) to be 
33 X
25 Y 
21 XY
11 Z
11 XZ
11 YZ
11 XYZ


attached are the sample inputs: 

  was:
PFPGrowth with my data is giving out wrong results. Attached are : 
- The input data
- The output (sequence file) generated by FPGrowth (PFPGrowth gives the same 
results)
- Output as text


$ cat part-r-00000 | grep 1678807047
12      1678807047
38      1678807047 3159925415

which says that the support (12) for the item (1678807047) is lesser than the 
support (38) of a pair containing that item. 


another example
$ cat part-r-00000  | grep 1441690161
12              1441690161 3910019844
18              1604285941 1441690161 3910019844
75              1441690161


Runtime parameters : 
-i baskets/part-r-00000 -o patterns -k 50 -method sequential -g 10 -regex 
'[\t]' -s 10


NOTE : Unable to attach files to JIRA. Here's the bundle of files (Input, 
SequenceOutput & TextOutput) https://files.me.com/vpandey/glsovt





> FPGrowth/PFPGrowth giving out wrong results. 
> ---------------------------------------------
>
>                 Key: MAHOUT-617
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-617
>             Project: Mahout
>          Issue Type: Bug
>          Components: Frequent Itemset/Association Rule Mining
>    Affects Versions: 0.4
>         Environment: Mac OS X, Linux
>            Reporter: Vipul Pandey
>            Assignee: Robin Anil
>              Labels: AssociationMining, FPGrowth, FrequentItemsets
>         Attachments: XYZ
>
>
> FPGrowth reports the support of itemsets individually - in that - if Item X 
> appears "individually" 12 times and appears with item Y 10 times (a total of 
> 22 times) AND item Y appears "individually" 4 times (a total of 14 times) 
> then this is what the output will be (say for min-support 2)
> 12 X
> 10 XY
> 4  Y
> Instead of 
> 22 X
> 10 XY
> 14 Y
> Also, because of this If the minimum support is 5 then the output will look 
> like : 
> 12 X
> 10 X Y
> Thus totally Ignoring Y
> if the minimum support is 11 then the output will look like 
> 12 X
> again Ignoring Y
> if the minimum support is 13 then there will be NO output. even though all 
> the way along Xs support was 22 and Y's was 14
> Even if we want to show just the maximal itemsets (although i would like to 
> see ALL the frequent itemsets - maximal or not) this output is wrong as with 
> a support of 13 we should still have seen X(22) and Y(14)
> Now Say you add XYZ 11 times
> for support 1 you'd see
> 12 X
> 10 X Y
> 11 X Y Z
> 4   Y
> And for support 11 you'd see
> 12 X
> 11 X Y Z
> Although I'd expect the output (for both s=1 & s=11) to be 
> 33 X
> 25 Y 
> 21 XY
> 11 Z
> 11 XZ
> 11 YZ
> 11 XYZ
> attached are the sample inputs: 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to