[
https://issues.apache.org/jira/browse/MAHOUT-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14504406#comment-14504406
]
Andrew Palumbo edited comment on MAHOUT-1693 at 4/21/15 6:14 AM:
-----------------------------------------------------------------
bq.The question is why do we need so much memory? A 5000x5000 matrix of doubles
should only take up ~200MB of space?"
So it seems like the real memory hog here is:
{code}
public String toString() {
StringBuilder s = new StringBuilder("{\n");
Iterator<MatrixSlice> it = iterator();
while (it.hasNext()) {
MatrixSlice next = it.next();
s.append(" ").append(next.index()).append("
=>\t").append(next.vector()).append('\n');
}
s.append("}");
return s.toString();
}
}
{code}
ie. each time a large in-core matrix is the result of an operation or a
function within the spark-shell, the toString() method is called (though
truncated by the shell itself).
So if the result of an operation or function is e.g. a Dense Matrix of 5000 x
5000 doubles the spark-shell actually tries to create a String representation
of 250000000 doubles.
was (Author: andrew_palumbo):
So it seems like the real memory hog here is:
{code}
public String toString() {
StringBuilder s = new StringBuilder("{\n");
Iterator<MatrixSlice> it = iterator();
while (it.hasNext()) {
MatrixSlice next = it.next();
s.append(" ").append(next.index()).append("
=>\t").append(next.vector()).append('\n');
}
s.append("}");
return s.toString();
}
}
{code}
ie. each time a large in-core matrix is the result of an operation or a
function within the spark-shell, the toString() method is called (though
truncated by the shell itself). so a Dense Matrix of 5000 x 5000 doubles
actually tries to create a String representation of 250000000 doubles.
> FunctionalMatrixView materializes row vectors in scala shell
> ------------------------------------------------------------
>
> Key: MAHOUT-1693
> URL: https://issues.apache.org/jira/browse/MAHOUT-1693
> Project: Mahout
> Issue Type: Bug
> Components: Mahout spark shell, Math
> Affects Versions: 0.10.0
> Reporter: Suneel Marthi
> Assignee: Andrew Palumbo
> Priority: Blocker
> Fix For: 0.10.1
>
>
> FunctionalMatrixView materializes row vectors in scala shell.
> Problem first reported by a user Michael Alton, Intel:
> "When I first tried to make a large matrix, I got an out of Java heap space
> error. I increased the memory incrementally until I got it to work. “export
> MAHOUT_HEAPSIZE=8000” didn’t work, but “export MAHOUT_HEAPSIZE=64000” did.
> The question is why do we need so much memory? A 5000x5000 matrix of doubles
> should only take up ~200MB of space?"
> Problem has been narrowed down to not override toString() method in
> FunctionalMatrixView which causes it to materialize all of the row vectors
> when run in Mahout Spark Shell.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)