[
https://issues.apache.org/jira/browse/MAHOUT-1042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Bhaskar Devireddy updated MAHOUT-1042:
--------------------------------------
Attachment: Mahout_1042.patch
> Hotspot in RecommenderJob-PartialMultiplyMapper-Reducer
> -------------------------------------------------------
>
> Key: MAHOUT-1042
> URL: https://issues.apache.org/jira/browse/MAHOUT-1042
> Project: Mahout
> Issue Type: Improvement
> Components: Collaborative Filtering
> Affects Versions: 0.6, 0.7
> Reporter: Bhaskar Devireddy
> Assignee: Sean Owen
> Priority: Minor
> Attachments: Mahout_1042.patch
>
>
> While profiling PartialMultiplyMapper-Reducer job we noticed a hotspot
> consuming more than 40% of the CPU time in
> org.apache.mahout.math.RandomAccessSparseVector.assign method for the reducer
> task. We used the script provided in mahout examples for running ASF Email
> recommendations for profiling. The hotspot is coming from the use of
> Vector.plus(Vector x) method in AggregateAndRecommendReducerc class. The
> pattern used is VectorA = VectorA.plus(VectorB). In this case VectorA
> doesn't have to be cloned using assign method. The attached patch addresses
> the hotspot by eliminating cloning in the above case for plus and times
> methods. This patch while retaining functionality (verified the output with
> and without patch), speeds up execution time of PartialMultiplyMapper-Reducer
> job by more than 10X on x86 architectures.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira