Some of generated patterns have support higher than in reality
--------------------------------------------------------------

                 Key: MAHOUT-625
                 URL: https://issues.apache.org/jira/browse/MAHOUT-625
             Project: Mahout
          Issue Type: Bug
          Components: Frequent Itemset/Association Rule Mining
    Affects Versions: 0.4
            Reporter: Jaroslaw Odzga
            Priority: Critical


It turnes out that some of generated patterns have incorrect support. The 
returned support is slightly higher than the true one.
I attached the test, which proves that FPGrowth has a bug. Test is using data 
(retail) found here: http://fimi.ua.ac.be/data/
The pattern (36, 39, 41) occurs in the transactions 572 times (this is also 
calculated in test), but the FPGrowth returns pattern (36, 39, 41) with support 
573.

Please note that mentioned pattern is not the only one with incorrect support - 
the test only point out one example to hace something to focus on. There is 
plenty more patterns with support higher than the real one. The biggest 
difference I noticed was support 8 higher than the real one for one of patterns.

Please find attached failing unit test - it's actually a maven project, which 
contains test data and is ready to run.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to