[ https://issues.apache.org/jira/browse/SYSTEMML-455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Matthias Boehm closed SYSTEMML-455. ----------------------------------- > OOM CP transpose in Spark hybrid mode > -------------------------------------- > > Key: SYSTEMML-455 > URL: https://issues.apache.org/jira/browse/SYSTEMML-455 > Project: SystemML > Issue Type: Bug > Components: Compiler > Reporter: Matthias Boehm > Assignee: Matthias Boehm > Fix For: SystemML 1.0 > > > The following data generation script failed with OOM in hybrid_spark > execution mode (config: 20GB driver memory), whereas the same script runs > fine with the same memory budget in hybrid_mr execution mode. > {code} > n = 30000; > B = Rand (rows = n, cols = n, min = -1, max = 1, pdf = "uniform", seed = > 1234); > v = exp (Rand (rows = n, cols = 1, min = -3, max = 3, pdf = "uniform", seed = > 5678)); > A = t(B) %*% (B * v); > write(A, "./tmp/A", format="binary"); > {code} > The resulting hop explain output is as follows: > {code} > # Memory Budget local/remote = 13739MB/184320MB/8602MB > # Degree of Parallelism (vcores) local/remote = 16/120 > PROGRAM > --MAIN PROGRAM > ----GENERIC (lines 4-12) [recompile=true] > ------(10) dg(rand) [30000,30000,1000,1000,900000000] [0,0,6866 -> 6866MB], CP > ------(21) r(t) (10) [30000,30000,1000,1000,900000000] [6866,0,6866 -> > 13733MB], CP > ------(19) dg(rand) [30000,1,1000,1000,30000] [0,0,0 -> 0MB], CP > ------(20) u(exp) (19) [30000,1,1000,1000,-1] [0,0,0 -> 0MB], CP > ------(22) b(*) (10,20) [30000,30000,1000,1000,-1] [6867,0,6866 -> 13733MB], > CP > ------(23) ba(+*) (21,22) [30000,30000,1000,1000,-1] [13733,6866,6866 -> > 27466MB], SPARK > ------(28) PWrite A (23) [30000,30000,1000,1000,-1] [6866,0,0 -> 6866MB], CP > {code} > The scripts fails at CP transpose with > {code} > Exception in thread "main" java.lang.OutOfMemoryError: Java heap space > at > org.apache.sysml.runtime.matrix.data.MatrixBlock.allocateDenseBlock(MatrixBlock.java:414) > at > org.apache.sysml.runtime.matrix.data.LibMatrixReorg.transposeDenseToDense(LibMatrixReorg.java:752) > at > org.apache.sysml.runtime.matrix.data.LibMatrixReorg.transpose(LibMatrixReorg.java:136) > at > org.apache.sysml.runtime.matrix.data.LibMatrixReorg.reorg(LibMatrixReorg.java:105) > at > org.apache.sysml.runtime.matrix.data.MatrixBlock.reorgOperations(MatrixBlock.java:3458) > at > org.apache.sysml.runtime.instructions.cp.ReorgCPInstruction.processInstruction(ReorgCPInstruction.java:129) > {code} > It's noteworthy that the failing cp instructions requires 13733MB at a memory > budget of 13739MB. The current guess is that Spark itself occupies > substantial memory overhead which eventually leads to the OOM - we should > adjust our memory budget in Spark execution modes to account for this > overhead. -- This message was sent by Atlassian JIRA (v6.3.15#6346)