[jira] [Commented] (SPARK-1392) Local spark-shell Runs Out of Memory With Default Settings

Pat McDonough (JIRA) Tue, 01 Apr 2014 23:02:31 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-1392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13957379#comment-13957379
 ]


Pat McDonough commented on SPARK-1392:
--------------------------------------

Running the following with the attached data results in the errors below:
{code}
scala> val explore = 
sc.textFile("/Users/pat/Projects/training-materials/Data/wiki_links")
...
scala> explore.cache
res1: org.apache.spark.rdd.RDD[String] = MappedRDD[1] at textFile at 
<console>:12
...
scala> explore.count
...
14/04/01 22:52:48 INFO HadoopRDD: Input split: 
file:/Users/pat/Projects/training-materials/Data/wiki_links/part-00007:0+25009430
14/04/01 22:52:54 INFO MemoryStore: ensureFreeSpace(55520836) called with 
curMem=271402430, maxMem=309225062
14/04/01 22:52:54 INFO MemoryStore: Will not store rdd_1_7 as it would require 
dropping another block from the same RDD
14/04/01 22:52:54 INFO BlockManager: Dropping block rdd_1_7 from memory
14/04/01 22:52:54 WARN BlockManager: Block rdd_1_7 could not be dropped from 
memory as it does not exist
14/04/01 22:52:54 INFO BlockManagerMaster: Updated info of block rdd_1_7
14/04/01 22:52:54 INFO BlockManagerMaster: Updated info of block rdd_1_7
14/04/01 22:52:54 INFO Executor: Serialized size of result for 7 is 563
14/04/01 22:52:54 INFO Executor: Sending result for 7 directly to driver
14/04/01 22:52:54 INFO Executor: Finished task ID 7
14/04/01 22:52:54 INFO TaskSetManager: Starting task 0.0:8 as TID 8 on executor 
localhost: localhost (PROCESS_LOCAL)
14/04/01 22:52:54 INFO TaskSetManager: Serialized task 0.0:8 as 1606 bytes in 2 
ms
14/04/01 22:52:54 INFO Executor: Running task ID 8
14/04/01 22:52:54 INFO TaskSetManager: Finished TID 7 in 6714 ms on localhost 
(progress: 7/10)
14/04/01 22:52:54 INFO DAGScheduler: Completed ResultTask(0, 7)
14/04/01 22:52:54 INFO BlockManager: Found block broadcast_0 locally
14/04/01 22:52:54 INFO CacheManager: Partition rdd_1_8 not found, computing it
14/04/01 22:52:54 INFO HadoopRDD: Input split: 
file:/Users/pat/Projects/training-materials/Data/wiki_links/part-00008:0+25904930
14/04/01 22:52:59 INFO TaskSetManager: Starting task 0.0:9 as TID 9 on executor 
localhost: localhost (PROCESS_LOCAL)
14/04/01 22:52:59 ERROR Executor: Exception in task ID 8
{code}


{noformat}
java.lang.OutOfMemoryError: GC overhead limit exceeded
        at java.nio.HeapCharBuffer.<init>(HeapCharBuffer.java:57)
        at java.nio.CharBuffer.allocate(CharBuffer.java:331)
        at java.nio.charset.CharsetDecoder.decode(CharsetDecoder.java:777)
        at org.apache.hadoop.io.Text.decode(Text.java:405)
        at org.apache.hadoop.io.Text.decode(Text.java:382)
        at org.apache.hadoop.io.Text.toString(Text.java:280)
        at 
org.apache.spark.SparkContext$$anonfun$textFile$1.apply(SparkContext.scala:344)
        at 
org.apache.spark.SparkContext$$anonfun$textFile$1.apply(SparkContext.scala:344)
        at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
        at scala.collection.Iterator$class.foreach(Iterator.scala:727)
        at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
        at 
scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
        at 
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
        at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:75)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:109)
        at org.apache.spark.scheduler.Task.run(Task.scala:53)
        at 
org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:213)
        at 
org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:49)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:744)
{noformat}


> Local spark-shell Runs Out of Memory With Default Settings
> ----------------------------------------------------------
>
>                 Key: SPARK-1392
>                 URL: https://issues.apache.org/jira/browse/SPARK-1392
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 0.9.0
>         Environment: OS X 10.9.2, Java 1.7.0_51, Scala 2.10.3
>            Reporter: Pat McDonough
>
> Using the spark-0.9.0 Hadoop2 binary from the project download page, running 
> the spark-shell locally in out of the box configuration, and attempting to 
> cache all the attached data, spark OOMs with: java.lang.OutOfMemoryError: GC 
> overhead limit exceeded
> You can work around the issue by either decreasing 
> spark.storage.memoryFraction or increasing SPARK_MEM



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (SPARK-1392) Local spark-shell Runs Out of Memory With Default Settings

Reply via email to