[
https://issues.apache.org/jira/browse/MAHOUT-617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Vipul Pandey updated MAHOUT-617:
--------------------------------
Description:
PFPGrowth with my data is giving out wrong results. Attached are :
- The input data
- The output (sequence file) generated by FPGrowth (PFPGrowth gives the same
results)
- Output as text
$ cat part-r-00000 | grep 1678807047
12 1678807047
38 1678807047 3159925415
which says that the support (12) for the item (1678807047) is lesser than the
support (38) of a pair containing that item.
another example
$ cat part-r-00000 | grep 1441690161
12 1441690161 3910019844
18 1604285941 1441690161 3910019844
75 1441690161
Runtime parameters :
-i baskets/part-r-00000 -o patterns -k 50 -method sequential -g 10 -regex
'[\t]' -s 10
NOTE : Unable to attach files to JIRA. Here's the bundle of files (Input,
SequenceOutput & TextOutput) https://files.me.com/vpandey/glsovt
was:
PFPGrowth with my data is giving out wrong results. Attached are :
- The input data
- The output (sequence file) generated by FPGrowth (PFPGrowth gives the same
results)
- Output as text
$ cat part-r-00000 | grep 1678807047
12 1678807047
38 1678807047 3159925415
which says that the support (12) for the item (1678807047) is lesser than the
support (38) of a pair containing that item.
another example
$ cat part-r-00000 | grep 1441690161
12 1441690161 3910019844
18 1604285941 1441690161 3910019844
75 1441690161
Runtime parameters :
-i baskets/part-r-00000 -o patterns -k 50 -method sequential -g 10 -regex
'[\t]' -s 10
> FPGrowth/PFPGrowth giving out wrong results.
> ---------------------------------------------
>
> Key: MAHOUT-617
> URL: https://issues.apache.org/jira/browse/MAHOUT-617
> Project: Mahout
> Issue Type: Bug
> Components: Frequent Itemset/Association Rule Mining
> Affects Versions: 0.4
> Environment: Mac OS X, Linux
> Reporter: Vipul Pandey
> Assignee: Robin Anil
> Labels: AssociationMining, FPGrowth, FrequentItemsets
>
> PFPGrowth with my data is giving out wrong results. Attached are :
> - The input data
> - The output (sequence file) generated by FPGrowth (PFPGrowth gives the same
> results)
> - Output as text
> $ cat part-r-00000 | grep 1678807047
> 12 1678807047
> 38 1678807047 3159925415
> which says that the support (12) for the item (1678807047) is lesser than the
> support (38) of a pair containing that item.
> another example
> $ cat part-r-00000 | grep 1441690161
> 12 1441690161 3910019844
> 18 1604285941 1441690161 3910019844
> 75 1441690161
> Runtime parameters :
> -i baskets/part-r-00000 -o patterns -k 50 -method sequential -g 10 -regex
> '[\t]' -s 10
> NOTE : Unable to attach files to JIRA. Here's the bundle of files (Input,
> SequenceOutput & TextOutput) https://files.me.com/vpandey/glsovt
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira