[ https://issues.apache.org/jira/browse/SPARK-1392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13957379#comment-13957379 ]
Pat McDonough commented on SPARK-1392: -------------------------------------- Running the following with the attached data results in the errors below: {code} scala> val explore = sc.textFile("/Users/pat/Projects/training-materials/Data/wiki_links") ... scala> explore.cache res1: org.apache.spark.rdd.RDD[String] = MappedRDD[1] at textFile at <console>:12 ... scala> explore.count ... 14/04/01 22:52:48 INFO HadoopRDD: Input split: file:/Users/pat/Projects/training-materials/Data/wiki_links/part-00007:0+25009430 14/04/01 22:52:54 INFO MemoryStore: ensureFreeSpace(55520836) called with curMem=271402430, maxMem=309225062 14/04/01 22:52:54 INFO MemoryStore: Will not store rdd_1_7 as it would require dropping another block from the same RDD 14/04/01 22:52:54 INFO BlockManager: Dropping block rdd_1_7 from memory 14/04/01 22:52:54 WARN BlockManager: Block rdd_1_7 could not be dropped from memory as it does not exist 14/04/01 22:52:54 INFO BlockManagerMaster: Updated info of block rdd_1_7 14/04/01 22:52:54 INFO BlockManagerMaster: Updated info of block rdd_1_7 14/04/01 22:52:54 INFO Executor: Serialized size of result for 7 is 563 14/04/01 22:52:54 INFO Executor: Sending result for 7 directly to driver 14/04/01 22:52:54 INFO Executor: Finished task ID 7 14/04/01 22:52:54 INFO TaskSetManager: Starting task 0.0:8 as TID 8 on executor localhost: localhost (PROCESS_LOCAL) 14/04/01 22:52:54 INFO TaskSetManager: Serialized task 0.0:8 as 1606 bytes in 2 ms 14/04/01 22:52:54 INFO Executor: Running task ID 8 14/04/01 22:52:54 INFO TaskSetManager: Finished TID 7 in 6714 ms on localhost (progress: 7/10) 14/04/01 22:52:54 INFO DAGScheduler: Completed ResultTask(0, 7) 14/04/01 22:52:54 INFO BlockManager: Found block broadcast_0 locally 14/04/01 22:52:54 INFO CacheManager: Partition rdd_1_8 not found, computing it 14/04/01 22:52:54 INFO HadoopRDD: Input split: file:/Users/pat/Projects/training-materials/Data/wiki_links/part-00008:0+25904930 14/04/01 22:52:59 INFO TaskSetManager: Starting task 0.0:9 as TID 9 on executor localhost: localhost (PROCESS_LOCAL) 14/04/01 22:52:59 ERROR Executor: Exception in task ID 8 {code} {noformat} java.lang.OutOfMemoryError: GC overhead limit exceeded at java.nio.HeapCharBuffer.<init>(HeapCharBuffer.java:57) at java.nio.CharBuffer.allocate(CharBuffer.java:331) at java.nio.charset.CharsetDecoder.decode(CharsetDecoder.java:777) at org.apache.hadoop.io.Text.decode(Text.java:405) at org.apache.hadoop.io.Text.decode(Text.java:382) at org.apache.hadoop.io.Text.toString(Text.java:280) at org.apache.spark.SparkContext$$anonfun$textFile$1.apply(SparkContext.scala:344) at org.apache.spark.SparkContext$$anonfun$textFile$1.apply(SparkContext.scala:344) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103) at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:75) at org.apache.spark.rdd.RDD.iterator(RDD.scala:230) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:109) at org.apache.spark.scheduler.Task.run(Task.scala:53) at org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:213) at org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:49) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) {noformat} > Local spark-shell Runs Out of Memory With Default Settings > ---------------------------------------------------------- > > Key: SPARK-1392 > URL: https://issues.apache.org/jira/browse/SPARK-1392 > Project: Spark > Issue Type: Bug > Components: Spark Core > Affects Versions: 0.9.0 > Environment: OS X 10.9.2, Java 1.7.0_51, Scala 2.10.3 > Reporter: Pat McDonough > > Using the spark-0.9.0 Hadoop2 binary from the project download page, running > the spark-shell locally in out of the box configuration, and attempting to > cache all the attached data, spark OOMs with: java.lang.OutOfMemoryError: GC > overhead limit exceeded > You can work around the issue by either decreasing > spark.storage.memoryFraction or increasing SPARK_MEM -- This message was sent by Atlassian JIRA (v6.2#6252)