[
https://issues.apache.org/jira/browse/MAHOUT-1007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sean Owen resolved MAHOUT-1007.
-------------------------------
Resolution: Not A Problem
Fix Version/s: (was: 0.7)
I'm going to tentatively resolve as "NotAProblem" since I agree with
Sebastian's guess that the real underlying issue is not pruning enough
cooccurrence. We can and should revisit this if there is a good reason to see
that this isn't the cause.
> Performance improvement in recommenditembased by splitting long records
> -----------------------------------------------------------------------
>
> Key: MAHOUT-1007
> URL: https://issues.apache.org/jira/browse/MAHOUT-1007
> Project: Mahout
> Issue Type: Improvement
> Components: Collaborative Filtering
> Affects Versions: 0.6
> Reporter: Bhaskar Devireddy
> Assignee: Sean Owen
> Priority: Minor
> Attachments: Patch_1007.patch
>
>
> While running the recommendations with ASFEMail dataset using the example
> script provided with mahout, we are noticing that one of the map task in
> unsymmetrify mapper job has a very long execution time than others. While
> profiling, the problem seems to be with the number of elements in each
> record. The attached patch address this issue by splitting longer records
> into smaller once, so the data distributed evenly among the unsymmetrify map
> tasks.
> There is a new command line option maxSimilarityReducerVectorSize is
> introduced for RecommanderJob. Tested with
> maxSimilarityReducerVectorSize=5000 and with same functionality speeds up
> unsymmetrify mapper job by several X on x86 architectures and increases CPU
> utilization. By default the records are not split and setting the command
> line option maxSimilarityReducerVectorSize to a value greater than 0 will
> increase performance.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira