[ 
https://issues.apache.org/jira/browse/MAHOUT-1578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14028285#comment-14028285
 ] 

ASF GitHub Bot commented on MAHOUT-1578:
----------------------------------------

Github user sscdotopen commented on the pull request:

    https://github.com/apache/mahout/pull/16#issuecomment-45788881
  
    I had some of those fixes included in the experiments that I ran last week. 
I did not measure the impact of individual fixes explicitly, but changes like 
directly setting an array of row vectors instead of assigning every row (where 
entries are potentially added with binary search in sequential sparse vectors) 
seem to have made a huge difference. 
    
    I had jobs that seemed to hang for minutes and when I jstacked on the 
workers, I saw that the code was busily wasting CPU in assigning entries to 
sparse rows in matrix deserialization... 


> Optimizations in matrix serialization
> -------------------------------------
>
>                 Key: MAHOUT-1578
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1578
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Math
>            Reporter: Sebastian Schelter
>             Fix For: 1.0
>
>
> MatrixWritable contains inefficient code in a few places:
>  
>  * type and size are stored with every vector, although they are the same for 
> every vector
>  * in some places vectors are added to the matrix via assign() in places 
> where we could directly use the instance
>  



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to