[jira] [Commented] (SPARK-11293) Spillable collections leak shuffle memory

Ovidiu Marcu (JIRA) Thu, 18 Feb 2016 01:53:32 -0800

    [ 
https://issues.apache.org/jira/browse/SPARK-11293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15152042#comment-15152042
 ]


Ovidiu Marcu commented on SPARK-11293:
--------------------------------------

Whyle trying to execute Analytics Triangle Count with 
http://snap.stanford.edu/data/com-Friendster.html (with LiveJournal works ok) 
on about 24 nodes (spark standalone 1.52, around 32g memory each node, 128 
partitions, parallelism cores*nodes*8) I get some errors that are maybe related:
16/02/18 08:52:11 ERROR Executor: Managed memory leak detected; size = 67108864 
bytes, TID = 507
16/02/18 08:52:11 INFO MapOutputTrackerWorker: Updating epoch to 8 and clearing 
cache
16/02/18 08:52:11 INFO TorrentBroadcast: Started reading broadcast variable 6
16/02/18 08:52:11 ERROR Executor: Exception in task 123.0 in stage 3.0 (TID 507)
java.lang.OutOfMemoryError: Java heap space
        at scala.reflect.ManifestFactory$$anon$10.newArray(Manifest.scala:122)
        at scala.reflect.ManifestFactory$$anon$10.newArray(Manifest.scala:120)
        at 
org.apache.spark.util.collection.OpenHashSet.rehash(OpenHashSet.scala:231)
        at 
org.apache.spark.util.collection.OpenHashSet.rehashIfNeeded(OpenHashSet.scala:166)
        at 
org.apache.spark.util.collection.OpenHashSet.rehashIfNeeded$mcJ$sp(OpenHashSet.scala:164)
        at 
org.apache.spark.graphx.util.collection.GraphXPrimitiveKeyOpenHashMap$mcJI$sp.changeValue$mcJI$sp(GraphXPrimitiveKeyOpenHashMap.scala:107)
        at 
org.apache.spark.graphx.impl.EdgePartitionBuilder.toEdgePartition(EdgePartitionBuilder.scala:58)
        at 
org.apache.spark.graphx.impl.GraphImpl$$anonfun$4.apply(GraphImpl.scala:115)
        at 
org.apache.spark.graphx.impl.GraphImpl$$anonfun$4.apply(GraphImpl.scala:109)
        at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$18.apply(RDD.scala:727)
        at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$18.apply(RDD.scala:727)
        at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300)
        at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:69)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:262)
        at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
        at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
        at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
        at org.apache.spark.scheduler.Task.run(Task.scala:88)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)

> Spillable collections leak shuffle memory
> -----------------------------------------
>
>                 Key: SPARK-11293
>                 URL: https://issues.apache.org/jira/browse/SPARK-11293
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 1.3.1, 1.4.1, 1.5.1, 1.6.0
>            Reporter: Josh Rosen
>            Assignee: Josh Rosen
>            Priority: Critical
>
> I discovered multiple leaks of shuffle memory while working on my memory 
> manager consolidation patch, which added the ability to do strict memory leak 
> detection for the bookkeeping that used to be performed by the 
> ShuffleMemoryManager. This uncovered a handful of places where tasks can 
> acquire execution/shuffle memory but never release it, starving themselves of 
> memory.
> Problems that I found:
> * {{ExternalSorter.stop()}} should release the sorter's shuffle/execution 
> memory.
> * BlockStoreShuffleReader should call {{ExternalSorter.stop()}} using a 
> {{CompletionIterator}}.
> * {{ExternalAppendOnlyMap}} exposes no equivalent of {{stop()}} for freeing 
> its resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SPARK-11293) Spillable collections leak shuffle memory

Reply via email to