[
https://issues.apache.org/jira/browse/SPARK-11293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15152042#comment-15152042
]
Ovidiu Marcu commented on SPARK-11293:
--------------------------------------
Whyle trying to execute Analytics Triangle Count with
http://snap.stanford.edu/data/com-Friendster.html (with LiveJournal works ok)
on about 24 nodes (spark standalone 1.52, around 32g memory each node, 128
partitions, parallelism cores*nodes*8) I get some errors that are maybe related:
16/02/18 08:52:11 ERROR Executor: Managed memory leak detected; size = 67108864
bytes, TID = 507
16/02/18 08:52:11 INFO MapOutputTrackerWorker: Updating epoch to 8 and clearing
cache
16/02/18 08:52:11 INFO TorrentBroadcast: Started reading broadcast variable 6
16/02/18 08:52:11 ERROR Executor: Exception in task 123.0 in stage 3.0 (TID 507)
java.lang.OutOfMemoryError: Java heap space
at scala.reflect.ManifestFactory$$anon$10.newArray(Manifest.scala:122)
at scala.reflect.ManifestFactory$$anon$10.newArray(Manifest.scala:120)
at
org.apache.spark.util.collection.OpenHashSet.rehash(OpenHashSet.scala:231)
at
org.apache.spark.util.collection.OpenHashSet.rehashIfNeeded(OpenHashSet.scala:166)
at
org.apache.spark.util.collection.OpenHashSet.rehashIfNeeded$mcJ$sp(OpenHashSet.scala:164)
at
org.apache.spark.graphx.util.collection.GraphXPrimitiveKeyOpenHashMap$mcJI$sp.changeValue$mcJI$sp(GraphXPrimitiveKeyOpenHashMap.scala:107)
at
org.apache.spark.graphx.impl.EdgePartitionBuilder.toEdgePartition(EdgePartitionBuilder.scala:58)
at
org.apache.spark.graphx.impl.GraphImpl$$anonfun$4.apply(GraphImpl.scala:115)
at
org.apache.spark.graphx.impl.GraphImpl$$anonfun$4.apply(GraphImpl.scala:109)
at
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$18.apply(RDD.scala:727)
at
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$18.apply(RDD.scala:727)
at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300)
at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:69)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:262)
at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:88)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
> Spillable collections leak shuffle memory
> -----------------------------------------
>
> Key: SPARK-11293
> URL: https://issues.apache.org/jira/browse/SPARK-11293
> Project: Spark
> Issue Type: Bug
> Components: Spark Core
> Affects Versions: 1.3.1, 1.4.1, 1.5.1, 1.6.0
> Reporter: Josh Rosen
> Assignee: Josh Rosen
> Priority: Critical
>
> I discovered multiple leaks of shuffle memory while working on my memory
> manager consolidation patch, which added the ability to do strict memory leak
> detection for the bookkeeping that used to be performed by the
> ShuffleMemoryManager. This uncovered a handful of places where tasks can
> acquire execution/shuffle memory but never release it, starving themselves of
> memory.
> Problems that I found:
> * {{ExternalSorter.stop()}} should release the sorter's shuffle/execution
> memory.
> * BlockStoreShuffleReader should call {{ExternalSorter.stop()}} using a
> {{CompletionIterator}}.
> * {{ExternalAppendOnlyMap}} exposes no equivalent of {{stop()}} for freeing
> its resources.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]