Bhaskar Devireddy created MAHOUT-1042:
-----------------------------------------
Summary: Hotspot in RecommenderJob-PartialMultiplyMapper-Reducer
Key: MAHOUT-1042
URL: https://issues.apache.org/jira/browse/MAHOUT-1042
Project: Mahout
Issue Type: Improvement
Components: Collaborative Filtering
Affects Versions: 0.7, 0.6
Reporter: Bhaskar Devireddy
Assignee: Sean Owen
Priority: Minor
While profiling PartialMultiplyMapper-Reducer job we noticed a hotspot
consuming more than 40% of the CPU time in
org.apache.mahout.math.RandomAccessSparseVector.assign method for the reducer
task. We used the script provided in mahout examples for running ASF Email
recommendations for profiling. The hotspot is coming from the use of
Vector.plus(Vector x) method in AggregateAndRecommendReducerc class. The
pattern used is VectorA = VectorA.plus(VectorB). In this case VectorA doesn't
have to be cloned using assign method. The attached patch addresses the
hotspot by eliminating cloning in the above case for plus and times methods.
This patch while retaining functionality (verified the output with and without
patch), speeds up execution time of PartialMultiplyMapper-Reducer job by more
than 10X on x86 architectures.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira