[
https://issues.apache.org/jira/browse/MAHOUT-1035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Bhaskar Devireddy updated MAHOUT-1035:
--------------------------------------
Attachment: patch_1035_ver2.patch
Made changes to the initial patch and incorporated your comments.
The Hashmap change is to improve performance of keys method in cases where the
initial capacity is huge with very few elements populated. But in this case it
won't help since we are fixing the initial capacity. I removed it from the
patch.
> Hotspot in recommenditembased – UnsymmetrifyMapper job
> ------------------------------------------------------
>
> Key: MAHOUT-1035
> URL: https://issues.apache.org/jira/browse/MAHOUT-1035
> Project: Mahout
> Issue Type: Improvement
> Components: Collaborative Filtering
> Affects Versions: 0.7
> Reporter: Bhaskar Devireddy
> Assignee: Sebastian Schelter
> Priority: Minor
> Fix For: 0.8
>
> Attachments: patch_1035.patch, patch_1035_ver2.patch
>
>
> While profiling the unsymmetrify mapper job in recommendations we noticed an
> hotspot consuming 90% of the CPU runtime in
> org.apache.mahout.math.map.OpenIntDoubleHashMap.keys method for the first map
> task. We used the script provided in mahout examples for running ASF Email
> recommendations for profiling. The attached patch addresses the hotspot by
> reducing the number of for loop iterations in OpenIntDoubleHashMap.keys
> method by changing the initialization of transposedPartial. This patch while
> retaining functionality(verified the output with and without patch) speeds up
> the unsymmetrify mapper task by more than 4X on x86 architectures.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira