[ 
https://issues.apache.org/jira/browse/MAHOUT-157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12752249#action_12752249
 ] 

Isabel Drost commented on MAHOUT-157:
-------------------------------------

The formatting still looks a bit weird (spaces, line length etc.)

PFPGrowth, line 101, 183, 214 - please at least add a warning log message prior 
to deleting pre-existing output path and document somewhere on the usage page 
that default behaviour is deleting the output path, if exists. (I think that 
differs from the implementation in lda - we need to agree on consistant 
behaviour across mahout in such cases).

line 174 - the combiner is commented out?

ParallelCountingMapper - shouldn't you report status through the reporter 
during mapping?

ParallelFPGrowthMapper - line 87 - please do not use e.printStackTrace() but 
generate a regular log message and log the exception stack trace through the 
logger.

I would love to see some more comments: the expected format of key and value, 
the expected content of glist and flist.

ParallelFPGrowthReducer line 111 - don't use e.printStackTrace.

AggregatorReducer - line 91 same

Attribute/TreeNode - The code is pretty clear, still I would love to see some 
more documentation on the overall data structure.

FrequentPatternMaxHeap - line 74 - Huh? Judging from the return value, you can 
omit the comparison against null here. (line 81 same.

FPGrowth - line 42 - the method name should not start with a capital letter.

517 lines for implementing the whole algorithm in one class - looks a bit large 
for me. Is it possible to split it up?

line 165 - converting from Integer to int and back again usually costs quite a 
bit of performance. Is there a way to rely on primitives only, or implement 
your own incrementable integer type? Btw., Integer.valueOf(1) should be 
replaced by Integer.ONE - that should be quicker and prevent in-accuracies.

Type T - I think it would make the code better readable if T were given a 
clearer name, something like TransactionType? Otherwise you need to document 
what exactly T represents.

line 229: Would reformulating the while conditions as 
"while(!tempNode.childNodes.isEmpty()) { ... } make the code clearer here?

line 239: Where does the magic number 6 come from here? Define as a constant 
with a speaking name?

the two generatedSinglePathPatterns methods look rather similar - is it 
possible to not copy the code but extract it into its own method or reuse one 
in the other?

line 293 (and earlier): Where does the magic number 4 come from? Define 
constant with speaking name?

line 506: Looks like a strange log message?

Concerning your idea of going the algorithm interface way for fpGrowth: If you 
can already make out what the interface should look like, I think that would be 
a good way to make it easier for future implementors of other frequent itemset 
algorithms.

> Frequent Pattern Mining using Parallel FP-Growth
> ------------------------------------------------
>
>                 Key: MAHOUT-157
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-157
>             Project: Mahout
>          Issue Type: New Feature
>    Affects Versions: 0.2
>            Reporter: Robin Anil
>             Fix For: 0.2
>
>         Attachments: MAHOUT-157-August-17.patch, MAHOUT-157-August-24.patch, 
> MAHOUT-157-August-31.patch, MAHOUT-157-August-6.patch, 
> MAHOUT-157-Combinations-BSD-License.patch, 
> MAHOUT-157-Combinations-BSD-License.patch, 
> MAHOUT-157-inProgress-August-5.patch, MAHOUT-157-September-5.patch
>
>
> Implement: http://infolab.stanford.edu/~echang/recsys08-69.pdf

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to