[
https://issues.apache.org/jira/browse/SPARK-32795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Victor Tso updated SPARK-32795:
-------------------------------
Attachment: image-2020-09-03-23-27-11-809.png
> ApplicationInfo#removedExecutors can cause OOM
> ----------------------------------------------
>
> Key: SPARK-32795
> URL: https://issues.apache.org/jira/browse/SPARK-32795
> Project: Spark
> Issue Type: Bug
> Components: Spark Core
> Affects Versions: 2.4.0
> Reporter: Victor Tso
> Priority: Critical
> Attachments: image-2020-09-03-23-27-11-809.png
>
>
> !image-2020-09-03-23-23-45-294.png!
> In my case, the Standalone Spark master process had a max heap of 1g. 738mb
> were consumed by these ExecutorDesc objects, the vast majority of which were
> the 18.5M removedExecutors. This caused the master to OOM and leave the
> application driver process dangling.
> The reason for this is that the worker node ran out of disk space, so for
> whatever reason decided to go in a fast and endless loop trying to launch new
> executors and they in turn crashed too. It got up to the 18M before the
> master just couldn't handle the history anymore.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]