[ 
https://issues.apache.org/jira/browse/SPARK-11278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14972155#comment-14972155
 ] 

Nishkam Ravi edited comment on SPARK-11278 at 10/24/15 12:05 AM:
-----------------------------------------------------------------

Yeah, the problem goes away with useLegacyMode = true  (as expected).

In the executor logs, I see large spills:
15/10/23 14:27:13 INFO collection.ExternalSorter: Thread 145 spilling in-memory 
map of 1477.0 MB to disk (1 time so far)

and OOM errors:
15/10/23 14:47:44 ERROR executor.Executor: Exception in task 99.0 in stage 0.0 
(TID 94)
java.lang.OutOfMemoryError: Java heap space
        at 
org.apache.spark.util.collection.AppendOnlyMap.growTable(AppendOnlyMap.scala:218)
        at 
org.apache.spark.util.collection.SizeTrackingAppendOnlyMap.growTable(SizeTrackingAppendOnlyMap.scala:38)
        at 
org.apache.spark.util.collection.AppendOnlyMap.incrementSize(AppendOnlyMap.scala:204)
        at 
org.apache.spark.util.collection.AppendOnlyMap.changeValue(AppendOnlyMap.scala:151)
        at 
org.apache.spark.util.collection.SizeTrackingAppendOnlyMap.changeValue(SizeTrackingAppendOnlyMap.scala:32)
        at 
org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:210)
        at 
org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:73)
        at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
        at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
        at org.apache.spark.scheduler.Task.run(Task.scala:88)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)

Different values of spark.memory.fraction and spark.memory.storageFraction 
didn't help either.

With smaller executors the workload goes through but with a 1.6x performance 
degradation (as compared to without this commit). The spills are much smaller: 
15/10/23 15:43:26 INFO collection.ExternalSorter: Thread 117 spilling in-memory 
map of 5.0 MB to disk (1 time so far)




was (Author: nravi):
Yeah, the problem goes away with useLegacyMode = true. 

In the executor logs, I see large spills:
15/10/23 14:27:13 INFO collection.ExternalSorter: Thread 145 spilling in-memory 
map of 1477.0 MB to disk (1 time so far)

and OOM errors:
15/10/23 14:47:44 ERROR executor.Executor: Exception in task 99.0 in stage 0.0 
(TID 94)
java.lang.OutOfMemoryError: Java heap space
        at 
org.apache.spark.util.collection.AppendOnlyMap.growTable(AppendOnlyMap.scala:218)
        at 
org.apache.spark.util.collection.SizeTrackingAppendOnlyMap.growTable(SizeTrackingAppendOnlyMap.scala:38)
        at 
org.apache.spark.util.collection.AppendOnlyMap.incrementSize(AppendOnlyMap.scala:204)
        at 
org.apache.spark.util.collection.AppendOnlyMap.changeValue(AppendOnlyMap.scala:151)
        at 
org.apache.spark.util.collection.SizeTrackingAppendOnlyMap.changeValue(SizeTrackingAppendOnlyMap.scala:32)
        at 
org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:210)
        at 
org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:73)
        at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
        at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
        at org.apache.spark.scheduler.Task.run(Task.scala:88)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)

Different values of spark.memory.fraction and spark.memory.storageFraction 
didn't help either.

With smaller executors the workload goes through but with a 1.6x performance 
degradation (as compared to without this commit). The spills are much smaller: 
15/10/23 15:43:26 INFO collection.ExternalSorter: Thread 117 spilling in-memory 
map of 5.0 MB to disk (1 time so far)



> PageRank fails with unified memory manager
> ------------------------------------------
>
>                 Key: SPARK-11278
>                 URL: https://issues.apache.org/jira/browse/SPARK-11278
>             Project: Spark
>          Issue Type: Bug
>          Components: GraphX, Spark Core
>    Affects Versions: 1.5.2, 1.6.0
>            Reporter: Nishkam Ravi
>
> PageRank (6-nodes, 32GB input) runs very slow and eventually fails with 
> ExecutorLostFailure. Traced it back to the 'unified memory manager' commit 
> from Oct 13th. Took a quick look at the code and couldn't see the problem 
> (changes look pretty good). cc'ing [~andrewor14][~vanzin] who may be able to 
> spot the problem quickly. Can be reproduced by running PageRank on a large 
> enough input dataset if needed. Sorry for not being of much help here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to