[ 
https://issues.apache.org/jira/browse/MAHOUT-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14504406#comment-14504406
 ] 

Andrew Palumbo commented on MAHOUT-1693:
----------------------------------------

So it seems like the real memory hog here is:

{code} 
 public String toString() {
    StringBuilder s = new StringBuilder("{\n");
    Iterator<MatrixSlice> it = iterator();
    while (it.hasNext()) {
      MatrixSlice next = it.next();
      s.append("  ").append(next.index()).append("  
=>\t").append(next.vector()).append('\n');
    }
    s.append("}");
    return s.toString();
  }
} 
{code}

ie. each time a large in-core matrix is the result of an operation or a 
function within the spark-shell, the toString() method is called (though 
truncated by the shell itself).  so a Dense Matrix of 5000 x 5000 doubles 
actually tries to create a String representation of 250000000  doubles.




> FunctionalMatrixView materializes row vectors in scala shell
> ------------------------------------------------------------
>
>                 Key: MAHOUT-1693
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1693
>             Project: Mahout
>          Issue Type: Bug
>          Components: Mahout spark shell, Math
>    Affects Versions: 0.10.0
>            Reporter: Suneel Marthi
>            Assignee: Andrew Palumbo
>            Priority: Blocker
>             Fix For: 0.10.1
>
>
> FunctionalMatrixView materializes row vectors in scala shell.
> Problem first reported by a user Michael Alton, Intel:
> "When I first tried to make a large matrix, I got an out of Java heap space 
> error. I increased the memory incrementally until I got it to work. “export 
> MAHOUT_HEAPSIZE=8000” didn’t work, but “export MAHOUT_HEAPSIZE=64000” did. 
> The question is why do we need so much memory? A 5000x5000 matrix of doubles 
> should only take up ~200MB of space?"
> Problem has been narrowed down to not override toString() method in 
> FunctionalMatrixView which causes it to materialize all of the row vectors 
> when run in Mahout Spark Shell.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to