[
https://issues.apache.org/jira/browse/MAHOUT-1042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Schelter resolved MAHOUT-1042.
----------------------------------------
Resolution: Fixed
Fix Version/s: 0.8
Very nice find, thank you!
I changed the code here to not use .plus() and .times(), but only .assign() on
the vectors.
Furthermore, I added a special handling in the assign() method for PLUS_ABS and
found that two jobs in RecommenderJob need to be map-only, so I could remove
the identity reducers there.
Overall this should give a huge boost to our recommenders performance!
> Hotspot in RecommenderJob-PartialMultiplyMapper-Reducer
> -------------------------------------------------------
>
> Key: MAHOUT-1042
> URL: https://issues.apache.org/jira/browse/MAHOUT-1042
> Project: Mahout
> Issue Type: Improvement
> Components: Collaborative Filtering
> Affects Versions: 0.6, 0.7
> Reporter: Bhaskar Devireddy
> Assignee: Sebastian Schelter
> Priority: Minor
> Fix For: 0.8
>
> Attachments: MAHOUT-1042.patch, Mahout_1042.patch
>
>
> While profiling PartialMultiplyMapper-Reducer job we noticed a hotspot
> consuming more than 40% of the CPU time in
> org.apache.mahout.math.RandomAccessSparseVector.assign method for the reducer
> task. We used the script provided in mahout examples for running ASF Email
> recommendations for profiling. The hotspot is coming from the use of
> Vector.plus(Vector x) method in AggregateAndRecommendReducerc class. The
> pattern used is VectorA = VectorA.plus(VectorB). In this case VectorA
> doesn't have to be cloned using assign method. The attached patch addresses
> the hotspot by eliminating cloning in the above case for plus and times
> methods. This patch while retaining functionality (verified the output with
> and without patch), speeds up execution time of PartialMultiplyMapper-Reducer
> job by more than 10X on x86 architectures.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira