[ 
https://issues.apache.org/jira/browse/SYSTEMML-512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15153678#comment-15153678
 ] 

Matthias Boehm commented on SYSTEMML-512:
-----------------------------------------

[~mwdus...@us.ibm.com] I'm afraid not (other than using too conservative 
internal memory fractions, which are currently at 70%+15% for the driver). For 
yarn/mapreduce it is quite common that map/reduce task jvm opts are already 
configured w/ -Xmn or -XX:NewRatio. In our old readme, we recommended the same 
for HADOOP_CLIENT_OPTS. For Spark, I would say we should (1) update our 
sparkDML to do this automatically, and (2) also add this to our future 
troubleshooting guide. 

> DML Script With UDFs Results In Out Of Memory Error As Compared to Without 
> UDFs
> -------------------------------------------------------------------------------
>
>                 Key: SYSTEMML-512
>                 URL: https://issues.apache.org/jira/browse/SYSTEMML-512
>             Project: SystemML
>          Issue Type: Bug
>            Reporter: Mike Dusenberry
>         Attachments: test1.scala, test2.scala
>
>
> Currently, the following script for running a simple version of Poisson 
> non-negative matrix factorization (PNMF) runs in linear time as desired:
> {code}
> # data & args
> X = read($X)
> X = X+1 # change product IDs to be 1-based, rather than 0-based
> V = table(X[,1], X[,2])
> V = V[1:$size,1:$size]
> max_iteration = as.integer($maxiter)
> rank = as.integer($rank)
> # run PNMF
> n = nrow(V)
> m = ncol(V)
> range = 0.01
> W = Rand(rows=n, cols=rank, min=0, max=range, pdf="uniform")
> H = Rand(rows=rank, cols=m, min=0, max=range, pdf="uniform")
> i=0
> while(i < max_iteration) {
>   H = (H * (t(W) %*% (V/(W%*%H))))/t(colSums(W)) 
>   W = (W * ((V/(W%*%H)) %*% t(H)))/t(rowSums(H))
>   i = i + 1;
> }
> # compute negative log-likelihood
> negloglik_temp = -1 * (sum(V*log(W%*%H)) - as.scalar(colSums(W)%*%rowSums(H)))
> # write outputs
> negloglik = matrix(negloglik_temp, rows=1, cols=1)
> write(negloglik, $negloglikout)
> write(W, $Wout)
> write(H, $Hout)
> {code}
> However, a small refactoring of this same script to pull the core PNMF 
> algorithm and the negative log-likelihood computation out into separate UDFs 
> results in non-linear runtime and a Java out of memory heap error on the same 
> dataset.  
> {code}
> pnmf = function(matrix[double] V, integer max_iteration, integer rank) return 
> (matrix[double] W, matrix[double] H) {
>     n = nrow(V)
>     m = ncol(V)
>     
>     range = 0.01
>     W = Rand(rows=n, cols=rank, min=0, max=range, pdf="uniform")
>     H = Rand(rows=rank, cols=m, min=0, max=range, pdf="uniform")
>     
>     i=0
>     while(i < max_iteration) {
>       H = (H * (t(W) %*% (V/(W%*%H))))/t(colSums(W)) 
>       W = (W * ((V/(W%*%H)) %*% t(H)))/t(rowSums(H))
>       i = i + 1;
>     }
> }
> negloglikfunc = function(matrix[double] V, matrix[double] W, matrix[double] 
> H) return (double negloglik) {
>     negloglik = -1 * (sum(V*log(W%*%H)) - as.scalar(colSums(W)%*%rowSums(H)))
> }
> # data & args
> X = read($X)
> X = X+1 # change product IDs to be 1-based, rather than 0-based
> V = table(X[,1], X[,2])
> V = V[1:$size,1:$size]
> max_iteration = as.integer($maxiter)
> rank = as.integer($rank)
> # run PNMF and evaluate
> [W, H] = pnmf(V, max_iteration, rank)
> negloglik_temp = negloglikfunc(V, W, H)
> # write outputs
> negloglik = matrix(negloglik_temp, rows=1, cols=1)
> write(negloglik, $negloglikout)
> write(W, $Wout)
> write(H, $Hout)
> {code}
> The expectation would be that such modularization at the DML level should be 
> allowed without any impact on performance.
> Details:
> - Data: Amazon product co-purchasing dataset from Stanford 
> [http://snap.stanford.edu/data/amazon0601.html | 
> http://snap.stanford.edu/data/amazon0601.html]
> - Execution mode: Spark {{MLContext}}, but should be applicable to 
> command-line invocation as well. 
> - Error message:
> {code}
> java.lang.OutOfMemoryError: Java heap space
>       at 
> org.apache.sysml.runtime.matrix.data.MatrixBlock.allocateDenseBlock(MatrixBlock.java:415)
>       at 
> org.apache.sysml.runtime.matrix.data.MatrixBlock.sparseToDense(MatrixBlock.java:1212)
>       at 
> org.apache.sysml.runtime.matrix.data.MatrixBlock.examSparsity(MatrixBlock.java:1103)
>       at 
> org.apache.sysml.runtime.instructions.cp.MatrixMatrixArithmeticCPInstruction.processInstruction(MatrixMatrixArithmeticCPInstruction.java:60)
>       at 
> org.apache.sysml.runtime.controlprogram.ProgramBlock.executeSingleInstruction(ProgramBlock.java:309)
>       at 
> org.apache.sysml.runtime.controlprogram.ProgramBlock.executeInstructions(ProgramBlock.java:227)
>       at 
> org.apache.sysml.runtime.controlprogram.ProgramBlock.execute(ProgramBlock.java:169)
>       at 
> org.apache.sysml.runtime.controlprogram.WhileProgramBlock.execute(WhileProgramBlock.java:183)
>       at 
> org.apache.sysml.runtime.controlprogram.FunctionProgramBlock.execute(FunctionProgramBlock.java:115)
>       at 
> org.apache.sysml.runtime.instructions.cp.FunctionCallCPInstruction.processInstruction(FunctionCallCPInstruction.java:177)
>       at 
> org.apache.sysml.runtime.controlprogram.ProgramBlock.executeSingleInstruction(ProgramBlock.java:309)
>       at 
> org.apache.sysml.runtime.controlprogram.ProgramBlock.executeInstructions(ProgramBlock.java:227)
>       at 
> org.apache.sysml.runtime.controlprogram.ProgramBlock.execute(ProgramBlock.java:169)
>       at 
> org.apache.sysml.runtime.controlprogram.Program.execute(Program.java:146)
>       at 
> org.apache.sysml.api.MLContext.executeUsingSimplifiedCompilationChain(MLContext.java:1387)
>       at 
> org.apache.sysml.api.MLContext.compileAndExecuteScript(MLContext.java:1252)
>       at org.apache.sysml.api.MLContext.executeScript(MLContext.java:1184)
>       at org.apache.sysml.api.MLContext.executeScript(MLContext.java:1165)
>       at 
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(<console>:113)
>       at 
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(<console>:103)
>       at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>       at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>       at scala.collection.immutable.Range.foreach(Range.scala:141)
>       at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
>       at scala.collection.AbstractTraversable.map(Traversable.scala:105)
>       at 
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:103)
>       at 
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:135)
>       at 
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:137)
>       at 
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:139)
>       at 
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:141)
>       at 
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:143)
>       at 
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:145)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to