[ https://issues.apache.org/jira/browse/SYSTEMML-994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Matthias Boehm resolved SYSTEMML-994. ------------------------------------- Resolution: Fixed Fix Version/s: SystemML 0.11 > GC OOM: Binary Matrix to Frame Conversion > ----------------------------------------- > > Key: SYSTEMML-994 > URL: https://issues.apache.org/jira/browse/SYSTEMML-994 > Project: SystemML > Issue Type: Bug > Reporter: Mike Dusenberry > Assignee: Matthias Boehm > Priority: Blocker > Fix For: SystemML 0.11 > > > I currently have a SystemML matrix saved to HDFS in binary block format, and > am attempting to read it in, convert it to a {{frame}}, and then pass that to > an algorithm so that I can pull batches out of it with minimal overhead. > When attempting to run this, I am repeatedly hitting the following GC limit: > {code} > java.lang.OutOfMemoryError: GC overhead limit exceeded > at > org.apache.sysml.runtime.matrix.data.FrameBlock.ensureAllocatedColumns(FrameBlock.java:281) > at > org.apache.sysml.runtime.matrix.data.FrameBlock.copy(FrameBlock.java:979) > at > org.apache.sysml.runtime.matrix.data.FrameBlock.copy(FrameBlock.java:965) > at > org.apache.sysml.runtime.matrix.data.FrameBlock.<init>(FrameBlock.java:91) > at > org.apache.sysml.runtime.instructions.spark.utils.FrameRDDAggregateUtils$CreateBlockCombinerFunction.call(FrameRDDAggregateUtils.java:57) > at > org.apache.sysml.runtime.instructions.spark.utils.FrameRDDAggregateUtils$CreateBlockCombinerFunction.call(FrameRDDAggregateUtils.java:48) > at > org.apache.spark.api.java.JavaPairRDD$$anonfun$toScalaFunction$1.apply(JavaPairRDD.scala:1015) > at > org.apache.spark.util.collection.ExternalSorter$$anonfun$5.apply(ExternalSorter.scala:187) > at > org.apache.spark.util.collection.ExternalSorter$$anonfun$5.apply(ExternalSorter.scala:186) > at > org.apache.spark.util.collection.AppendOnlyMap.changeValue(AppendOnlyMap.scala:148) > at > org.apache.spark.util.collection.SizeTrackingAppendOnlyMap.changeValue(SizeTrackingAppendOnlyMap.scala:32) > at > org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:192) > at > org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:64) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) > at org.apache.spark.scheduler.Task.run(Task.scala:89) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {code} > Script: > {code} > train = read("train") > val = read("val") > trainf = as.frame(train) > valf = as.frame(val) > // Rest of algorithm, which passes the frames to DML functions, and performs > row indexing to pull out batches, convert to matrices, and train. > {code} > Cluster setup: > * Spark Standalone > * 1 Master, 9 Workers > * 47 cores, 124 GB available to Spark on each Worker (1 core + 1GB saved for > OS) > * spark.driver.memory 80g > * spark.executor.memory 21g > * spark.executor.cores 3 > * spark.default.parallelism 20000 > * spark.driver.maxResultSize 0 > * spark.akka.frameSize 128 > * spark.network.timeout 1000s > Note: This is using today's latest build as of 09.29.16 1:30PM PST. > cc [~mboehm7], [~acs_s] -- This message was sent by Atlassian JIRA (v6.3.4#6332)