[ 
https://issues.apache.org/jira/browse/MAHOUT-157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12763929#action_12763929
 ] 

Isabel Drost commented on MAHOUT-157:
-------------------------------------

Great work Robin. I just had a look at the code and only found some minor 
things: 

ParallelFPGrowth 
- it might be a good idea to reuse the DefaultOptionCreator to generate common 
options like input and output. 
- I would love to see a help option as well. 
- What happens, if the users gives the wrong parameters? As a user, I would 
rather not get confronted with a stack trace, even though it is an example. 
- did you provide details on how to run the algorithm, the assumptions it 
makes, file format, behaviour if the output file exists already on the wiki? 
- the class is named ParallelFPGrowth, but if I read it correctly, it looks 
like the entry point for both, the parallel and sequential version. Maybe 
rename to FPGrowthJob? 

FPGrowth 
 - line 98 is this really a recoverable error that does not cause 
inconsistancies later on? Log message says "this should not happen" - what if 
against all odds, it does happen? Why not throw a non-Checked Exception? 
- line 177 we should not have source code that is commented out in newly added 
code. 
- The class seems to implement both - top k and vanilla fp growth - would it 
make sense to split that up into different classes? 
- generateFrequentPatterns - maybe it is just me, but I am always happy to find 
tiny little comments in methods that long that very shortly explain what the 
following code block is doing. 

FPTreeDepthCache 
- maybe mention in the docs that the implementation is not threadsafe? 

FPTree, Pattern 
- missing a class comment. 

Pattern 
- line 173 - please remove code that is commented out 

AggregatorMapper 
- the reporter is left unused. 

Nice-To-Have: It would be nice to have package level comments in JavaDoc as 
well.

> Frequent Pattern Mining using Parallel FP-Growth
> ------------------------------------------------
>
>                 Key: MAHOUT-157
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-157
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Frequent Itemset/Association Rule Mining
>    Affects Versions: 0.2
>            Reporter: Robin Anil
>            Assignee: Robin Anil
>             Fix For: 0.2
>
>         Attachments: MAHOUT-157-August-17.patch, MAHOUT-157-August-24.patch, 
> MAHOUT-157-August-31.patch, MAHOUT-157-August-6.patch, 
> MAHOUT-157-Combinations-BSD-License.patch, 
> MAHOUT-157-Combinations-BSD-License.patch, 
> MAHOUT-157-inProgress-August-5.patch, MAHOUT-157-Oct-1.patch, 
> MAHOUT-157-Oct-8.pfpgrowth.patch, 
> MAHOUT-157-Oct-8.TestedMapReducePipeline.patch, 
> MAHOUT-157-September-10.patch, MAHOUT-157-September-18.patch, 
> MAHOUT-157-September-5.patch
>
>
> Implement: http://infolab.stanford.edu/~echang/recsys08-69.pdf

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to