[ 
https://issues.apache.org/jira/browse/SPARK-32795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Victor Tso updated SPARK-32795:
-------------------------------
    Attachment: image-2020-09-03-23-27-11-809.png

> ApplicationInfo#removedExecutors can cause OOM
> ----------------------------------------------
>
>                 Key: SPARK-32795
>                 URL: https://issues.apache.org/jira/browse/SPARK-32795
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 2.4.0
>            Reporter: Victor Tso
>            Priority: Critical
>         Attachments: image-2020-09-03-23-27-11-809.png
>
>
> !image-2020-09-03-23-23-45-294.png!
> In my case, the Standalone Spark master process had a max heap of 1g. 738mb 
> were consumed by these ExecutorDesc objects, the vast majority of which were 
> the 18.5M removedExecutors. This caused the master to OOM and leave the 
> application driver process dangling.
> The reason for this is that the worker node ran out of disk space, so for 
> whatever reason decided to go in a fast and endless loop trying to launch new 
> executors and they in turn crashed too. It got up to the 18M before the 
> master just couldn't handle the history anymore.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to