[ 
https://issues.apache.org/jira/browse/MAHOUT-1035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13403722#comment-13403722
 ] 

Sebastian Schelter commented on MAHOUT-1035:
--------------------------------------------

Nice find!

I think the problem with the like() call is that the newly created vector has 
the same initial capacity as the vector on which like() is invoked (a row of 
the similarity matrix here). This doesn't make any sense here and potentially 
wastes a lot of memory.

I think we should simply change the creation to 

{noformat}
Vector transposedPartial = new RandomAccessSparseVector(similarities.size(), 1);
{noformat}

I don't get the second part of your patch where you change 
OpenKeyTypeValueTypeHashMap. IMHO inserting there shouldn't be a way to early 
terminate that loop, did I look over something?

                
> Hotspot in recommenditembased – UnsymmetrifyMapper job
> ------------------------------------------------------
>
>                 Key: MAHOUT-1035
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1035
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>    Affects Versions: 0.7
>            Reporter: Bhaskar Devireddy
>            Assignee: Sean Owen
>            Priority: Minor
>             Fix For: 0.8
>
>         Attachments: patch_1035.patch
>
>
> While profiling the unsymmetrify mapper job in recommendations we noticed an 
> hotspot consuming 90% of the CPU runtime in 
> org.apache.mahout.math.map.OpenIntDoubleHashMap.keys method for the first map 
> task.  We used the script provided in mahout examples for running ASF Email 
> recommendations for profiling.  The attached patch addresses  the hotspot by 
> reducing the number of for loop iterations in OpenIntDoubleHashMap.keys 
> method by changing the initialization of transposedPartial.  This patch while 
> retaining functionality(verified the output with and without patch) speeds up 
> the unsymmetrify mapper task by more than 4X on x86 architectures.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


Reply via email to