[ 
https://issues.apache.org/jira/browse/SPARK-29177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrian Wang updated SPARK-29177:
--------------------------------
    Description: When we fetch results from executors and found the total size 
has exceeded the maxResultSize configured, Spark will simply abort the stage 
and all dependent jobs. But the task triggered this is actually successful, but 
never post out `TaskEnd` event, as a result it will never be removed from 
`CoarseGrainedSchedulerBackend`. If dynamic allocation is enabled, there will 
be zombie executor(s) remaining in resource manager, it will never die until 
application ends.  (was: When we fetch results from executors and found the 
total size has exceeded the maxResultSize configured, Spark will simply abort 
the stage and all dependent jobs. But the task triggered this is actually 
successful, but never posted `CompletionEvent` out, as a result it will never 
be removed from `CoarseGrainedSchedulerBackend`. If dynamic allocation is 
enabled, there will be zombie executor(s) remaining in resource manager, it 
will never die until application ends.)

> Zombie tasks prevents executor from releasing when task exceeds maxResultSize
> -----------------------------------------------------------------------------
>
>                 Key: SPARK-29177
>                 URL: https://issues.apache.org/jira/browse/SPARK-29177
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 2.3.4, 2.4.4
>            Reporter: Adrian Wang
>            Priority: Major
>
> When we fetch results from executors and found the total size has exceeded 
> the maxResultSize configured, Spark will simply abort the stage and all 
> dependent jobs. But the task triggered this is actually successful, but never 
> post out `TaskEnd` event, as a result it will never be removed from 
> `CoarseGrainedSchedulerBackend`. If dynamic allocation is enabled, there will 
> be zombie executor(s) remaining in resource manager, it will never die until 
> application ends.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to