[
https://issues.apache.org/jira/browse/SPARK-11278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14972155#comment-14972155
]
Nishkam Ravi edited comment on SPARK-11278 at 10/24/15 12:05 AM:
-----------------------------------------------------------------
Yeah, the problem goes away with useLegacyMode = true (as expected).
In the executor logs, I see large spills:
15/10/23 14:27:13 INFO collection.ExternalSorter: Thread 145 spilling in-memory
map of 1477.0 MB to disk (1 time so far)
and OOM errors:
15/10/23 14:47:44 ERROR executor.Executor: Exception in task 99.0 in stage 0.0
(TID 94)
java.lang.OutOfMemoryError: Java heap space
at
org.apache.spark.util.collection.AppendOnlyMap.growTable(AppendOnlyMap.scala:218)
at
org.apache.spark.util.collection.SizeTrackingAppendOnlyMap.growTable(SizeTrackingAppendOnlyMap.scala:38)
at
org.apache.spark.util.collection.AppendOnlyMap.incrementSize(AppendOnlyMap.scala:204)
at
org.apache.spark.util.collection.AppendOnlyMap.changeValue(AppendOnlyMap.scala:151)
at
org.apache.spark.util.collection.SizeTrackingAppendOnlyMap.changeValue(SizeTrackingAppendOnlyMap.scala:32)
at
org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:210)
at
org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:73)
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:88)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Different values of spark.memory.fraction and spark.memory.storageFraction
didn't help either.
With smaller executors the workload goes through but with a 1.6x performance
degradation (as compared to without this commit). The spills are much smaller:
15/10/23 15:43:26 INFO collection.ExternalSorter: Thread 117 spilling in-memory
map of 5.0 MB to disk (1 time so far)
was (Author: nravi):
Yeah, the problem goes away with useLegacyMode = true.
In the executor logs, I see large spills:
15/10/23 14:27:13 INFO collection.ExternalSorter: Thread 145 spilling in-memory
map of 1477.0 MB to disk (1 time so far)
and OOM errors:
15/10/23 14:47:44 ERROR executor.Executor: Exception in task 99.0 in stage 0.0
(TID 94)
java.lang.OutOfMemoryError: Java heap space
at
org.apache.spark.util.collection.AppendOnlyMap.growTable(AppendOnlyMap.scala:218)
at
org.apache.spark.util.collection.SizeTrackingAppendOnlyMap.growTable(SizeTrackingAppendOnlyMap.scala:38)
at
org.apache.spark.util.collection.AppendOnlyMap.incrementSize(AppendOnlyMap.scala:204)
at
org.apache.spark.util.collection.AppendOnlyMap.changeValue(AppendOnlyMap.scala:151)
at
org.apache.spark.util.collection.SizeTrackingAppendOnlyMap.changeValue(SizeTrackingAppendOnlyMap.scala:32)
at
org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:210)
at
org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:73)
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:88)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Different values of spark.memory.fraction and spark.memory.storageFraction
didn't help either.
With smaller executors the workload goes through but with a 1.6x performance
degradation (as compared to without this commit). The spills are much smaller:
15/10/23 15:43:26 INFO collection.ExternalSorter: Thread 117 spilling in-memory
map of 5.0 MB to disk (1 time so far)
> PageRank fails with unified memory manager
> ------------------------------------------
>
> Key: SPARK-11278
> URL: https://issues.apache.org/jira/browse/SPARK-11278
> Project: Spark
> Issue Type: Bug
> Components: GraphX, Spark Core
> Affects Versions: 1.5.2, 1.6.0
> Reporter: Nishkam Ravi
>
> PageRank (6-nodes, 32GB input) runs very slow and eventually fails with
> ExecutorLostFailure. Traced it back to the 'unified memory manager' commit
> from Oct 13th. Took a quick look at the code and couldn't see the problem
> (changes look pretty good). cc'ing [~andrewor14][~vanzin] who may be able to
> spot the problem quickly. Can be reproduced by running PageRank on a large
> enough input dataset if needed. Sorry for not being of much help here.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]