[jira] [Resolved] (MAHOUT-1007) Performance improvement in recommenditembased by splitting long records

Sean Owen (JIRA) Mon, 14 May 2012 09:36:17 -0700

     [ 
https://issues.apache.org/jira/browse/MAHOUT-1007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Sean Owen resolved MAHOUT-1007.
-------------------------------

       Resolution: Not A Problem
    Fix Version/s:     (was: 0.7)

I'm going to tentatively resolve as "NotAProblem" since I agree with 
Sebastian's guess that the real underlying issue is not pruning enough 
cooccurrence. We can and should revisit this if there is a good reason to see 
that this isn't the cause.
                
> Performance improvement in recommenditembased by splitting long records
> -----------------------------------------------------------------------
>
>                 Key: MAHOUT-1007
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1007
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>    Affects Versions: 0.6
>            Reporter: Bhaskar Devireddy
>            Assignee: Sean Owen
>            Priority: Minor
>         Attachments: Patch_1007.patch
>
>
> While running the recommendations with ASFEMail dataset using the example 
> script provided with mahout, we are noticing that one of the map task in 
> unsymmetrify mapper job has a very long execution time than others.  While 
> profiling, the problem seems to be with the number of elements in each 
> record.  The attached patch address this issue by splitting longer records 
> into smaller once, so the data distributed evenly among the unsymmetrify map 
> tasks.
> There is a new command line option maxSimilarityReducerVectorSize is 
> introduced for RecommanderJob.  Tested with 
> maxSimilarityReducerVectorSize=5000 and with same functionality speeds up 
> unsymmetrify mapper job by several X on x86 architectures and increases CPU 
> utilization.  By default the records are not split and setting the command 
> line option maxSimilarityReducerVectorSize to a value greater than 0 will 
> increase performance.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (MAHOUT-1007) Performance improvement in recommenditembased by splitting long records

Reply via email to