[ 
https://issues.apache.org/jira/browse/MAHOUT-1042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastian Schelter resolved MAHOUT-1042.
----------------------------------------

       Resolution: Fixed
    Fix Version/s: 0.8

Very nice find, thank you!

I changed the code here to not use .plus() and .times(), but only .assign() on 
the vectors.

Furthermore, I added a special handling in the assign() method for PLUS_ABS and 
found that two jobs in RecommenderJob need to be map-only, so I could remove 
the identity reducers there.

Overall this should give a huge boost to our recommenders performance!
                
> Hotspot in RecommenderJob-PartialMultiplyMapper-Reducer
> -------------------------------------------------------
>
>                 Key: MAHOUT-1042
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1042
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>    Affects Versions: 0.6, 0.7
>            Reporter: Bhaskar Devireddy
>            Assignee: Sebastian Schelter
>            Priority: Minor
>             Fix For: 0.8
>
>         Attachments: MAHOUT-1042.patch, Mahout_1042.patch
>
>
> While profiling PartialMultiplyMapper-Reducer job we noticed a hotspot 
> consuming more than 40% of the CPU time in 
> org.apache.mahout.math.RandomAccessSparseVector.assign method for the reducer 
> task.  We used the script provided in mahout examples for running ASF Email 
> recommendations for profiling. The hotspot is coming from the use of 
> Vector.plus(Vector x) method in AggregateAndRecommendReducerc class.  The 
> pattern used is VectorA = VectorA.plus(VectorB).  In this case VectorA 
> doesn't have to be cloned using assign method.  The attached patch addresses 
> the hotspot by eliminating cloning in the above case for plus and times 
> methods.  This patch while retaining functionality (verified the output with 
> and without patch), speeds up execution time of PartialMultiplyMapper-Reducer 
> job by more than 10X on x86 architectures.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to