[
https://issues.apache.org/jira/browse/MAHOUT-1007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13281158#comment-13281158
]
Bhaskar Devireddy commented on MAHOUT-1007:
-------------------------------------------
I am still experimenting with different threshold values Vs this patch using
different datasets. I will share my findings if something interesting comes
out of these experiments.
> Performance improvement in recommenditembased by splitting long records
> -----------------------------------------------------------------------
>
> Key: MAHOUT-1007
> URL: https://issues.apache.org/jira/browse/MAHOUT-1007
> Project: Mahout
> Issue Type: Improvement
> Components: Collaborative Filtering
> Affects Versions: 0.6
> Reporter: Bhaskar Devireddy
> Assignee: Sean Owen
> Priority: Minor
> Attachments: Patch_1007.patch
>
>
> While running the recommendations with ASFEMail dataset using the example
> script provided with mahout, we are noticing that one of the map task in
> unsymmetrify mapper job has a very long execution time than others. While
> profiling, the problem seems to be with the number of elements in each
> record. The attached patch address this issue by splitting longer records
> into smaller once, so the data distributed evenly among the unsymmetrify map
> tasks.
> There is a new command line option maxSimilarityReducerVectorSize is
> introduced for RecommanderJob. Tested with
> maxSimilarityReducerVectorSize=5000 and with same functionality speeds up
> unsymmetrify mapper job by several X on x86 architectures and increases CPU
> utilization. By default the records are not split and setting the command
> line option maxSimilarityReducerVectorSize to a value greater than 0 will
> increase performance.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira