Hi Spark devs, I am using 1.6.0 with dynamic allocation on yarn. I am trying to run a relatively big application with 10s of jobs and 100K+ tasks and my app fails with the exception below. The closest jira issue I could find is SPARK-11293 <https://issues.apache.org/jira/browse/SPARK-11293>, which is a critical bug that has been open for a long time. There are other similar jira issues (all fixed): SPARK-10474 <https://issues.apache.org/jira/browse/SPARK-10474>, SPARK-10733 <https://issues.apache.org/jira/browse/SPARK-10733>, SPARK-10309 <https://issues.apache.org/jira/browse/SPARK-10309>, SPARK-10379 <https://issues.apache.org/jira/browse/SPARK-10379>.
Any workarounds to this issue or any plans to fix it? Thanks a lot, Nezih 16/03/19 05:12:09 INFO memory.TaskMemoryManager: Memory used in task 4687016/03/19 05:12:09 INFO memory.TaskMemoryManager: Acquired by org.apache.spark.shuffle.sort.ShuffleExternalSorter@1c36f801: 32.0 KB16/03/19 05:12:09 INFO memory.TaskMemoryManager: 1512915599 bytes of memory were used by task 46870 but are not associated with specific consumers16/03/19 05:12:09 INFO memory.TaskMemoryManager: 1512948367 bytes of memory are used for execution and 156978343 bytes of memory are used for storage16/03/19 05:12:09 ERROR executor.Executor: Managed memory leak detected; size = 1512915599 bytes, TID = 4687016/03/19 05:12:09 ERROR executor.Executor: Exception in task 77.0 in stage 273.0 (TID 46870) java.lang.OutOfMemoryError: Unable to acquire 128 bytes of memory, got 0 at org.apache.spark.memory.MemoryConsumer.allocatePage(MemoryConsumer.java:120) at org.apache.spark.shuffle.sort.ShuffleExternalSorter.acquireNewPageIfNecessary(ShuffleExternalSorter.java:354) at org.apache.spark.shuffle.sort.ShuffleExternalSorter.insertRecord(ShuffleExternalSorter.java:375) at org.apache.spark.shuffle.sort.UnsafeShuffleWriter.insertRecordIntoSorter(UnsafeShuffleWriter.java:237) at org.apache.spark.shuffle.sort.UnsafeShuffleWriter.write(UnsafeShuffleWriter.java:164) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:89) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)16/03/19 05:12:09 ERROR util.SparkUncaughtExceptionHandler: Uncaught exception in thread Thread[Executor task launch worker-8,5,main] java.lang.OutOfMemoryError: Unable to acquire 128 bytes of memory, got 0 at org.apache.spark.memory.MemoryConsumer.allocatePage(MemoryConsumer.java:120) at org.apache.spark.shuffle.sort.ShuffleExternalSorter.acquireNewPageIfNecessary(ShuffleExternalSorter.java:354) at org.apache.spark.shuffle.sort.ShuffleExternalSorter.insertRecord(ShuffleExternalSorter.java:375) at org.apache.spark.shuffle.sort.UnsafeShuffleWriter.insertRecordIntoSorter(UnsafeShuffleWriter.java:237) at org.apache.spark.shuffle.sort.UnsafeShuffleWriter.write(UnsafeShuffleWriter.java:164) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:89) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)16/03/19 05:12:10 INFO storage.DiskBlockManager: Shutdown hook called16/03/19 05:12:10 INFO util.ShutdownHookManager: Shutdown hook called