[ 
https://issues.apache.org/jira/browse/MAHOUT-1355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13821621#comment-13821621
 ] 

Suneel Marthi commented on MAHOUT-1355:
---------------------------------------

[~smoens] Thanks for this patch. Some comments based on a very cursory firs 
pass through the code and not considering the actual algorithm and its 
implementation.

a)  Use Guava APIs where appropriate.

      For eg:-  Map<Integer,MutableLong> counts = new 
HashMap<Integer,MutableLong>();

         could be replaced by
           
                   Map<Integer,MutableLong> counts = Maps.newHashMap();

b)  Classes that actually launch MR jobs should extend Mahout's AbstractJob and 
leverage appropriate methods.

   For eg:-

    public class BigFIMDriver extends Configured implements Tool
  
   can be replaced by

   public class BigFIMDriver extends AbstractJob

  
    Replace 

          Job job = new Job(conf, "Apriori Phase" + i);

         /// and all of the code that follows this

    by 

        Job bigFmJob = prepareJob(.....);


Could u post this patch on Reviewboard, it would be much easier to comment and 
review then.

https://reviews.apache.org












> Frequent Pattern Mining algorithms for Mahout
> ---------------------------------------------
>
>                 Key: MAHOUT-1355
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1355
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Frequent Itemset/Association Rule Mining
>    Affects Versions: 0.9
>            Reporter: Sandy Moens
>            Priority: Minor
>         Attachments: MAHOUT-1355.patch
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> We implemented frequent pattern mining algorithms for Hadoop and adapted them 
> to Mahout. We used "PFP" (now deprecated) as a benchmark and these 
> implementations perform better in terms of speed and memory footprint. The 
> details of the implementations can be found in the paper Frequent Pattern 
> Mining for BigData ( http://adrem.ua.ac.be/bigfim )
> We have been maintaining the project for a while in GitLab ( 
> https://gitlab.com/adrem/bigfim ). Documentation for adaptation ( 
> Readme-Mahout.md ) and usage in mahout ( Mahout-wiki.md ) can be found there.
> We are open to any modification and/or improvement requests to make it more 
> worthwhile for the Mahout project. We, as the research group, volunteer to 
> maintain FPM algorithms as well.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to