[
https://issues.apache.org/jira/browse/MAHOUT-625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13006892#comment-13006892
]
Robin Anil commented on MAHOUT-625:
-----------------------------------
Jaroslaw. Can you submit a final patch with bug fix, test and the dataset. And
also move optimization over to a new issue, I want to go test more and maybe
add a cmd line flag to enable the optimization. The rest looks fine to commit.
Thanks again for taking initiative on getting the dataset in. And kudos for the
fix. It was not easy to figure out.
> Some of generated patterns have support higher than in reality
> --------------------------------------------------------------
>
> Key: MAHOUT-625
> URL: https://issues.apache.org/jira/browse/MAHOUT-625
> Project: Mahout
> Issue Type: Bug
> Components: Frequent Itemset/Association Rule Mining
> Affects Versions: 0.4
> Reporter: Jaroslaw Odzga
> Priority: Critical
> Attachments: MAHOUT-625-patch.txt, bugfix-patch.txt, dataset_ok.txt,
> mahout-test.zip
>
>
> It turnes out that some of generated patterns have incorrect support. The
> returned support is slightly higher than the true one.
> I attached the test, which proves that FPGrowth has a bug. Test is using data
> (retail) found here: http://fimi.ua.ac.be/data/
> The pattern (36, 39, 41) occurs in the transactions 572 times (this is also
> calculated in test), but the FPGrowth returns pattern (36, 39, 41) with
> support 573.
> Please note that mentioned pattern is not the only one with incorrect support
> - the test only point out one example to hace something to focus on. There is
> plenty more patterns with support higher than the real one. The biggest
> difference I noticed was support 8 higher than the real one for one of
> patterns.
> Please find attached failing unit test - it's actually a maven project, which
> contains test data and is ready to run.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira