[
https://issues.apache.org/jira/browse/SPARK-29287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Xingbo Jiang resolved SPARK-29287.
----------------------------------
Fix Version/s: 3.0.0
Assignee: Kent Yao
Resolution: Done
Fixed by https://github.com/apache/spark/pull/25964
> Executors should not receive any offers before they are actually constructed
> -----------------------------------------------------------------------------
>
> Key: SPARK-29287
> URL: https://issues.apache.org/jira/browse/SPARK-29287
> Project: Spark
> Issue Type: Improvement
> Components: Spark Core
> Affects Versions: 3.0.0
> Reporter: Kent Yao
> Assignee: Kent Yao
> Priority: Major
> Fix For: 3.0.0
>
>
> The executors send RegisterExecutor messages to the driver when onStart.
> The driver put the executor data in “the ready to serve map” if it could be,
> then send RegisteredExecutor back to the executor. The driver now can make
> an offer to this executor.
> But the executor is not fully constructed yet. When it received
> RegisteredExecutor, it start to construct itself, initializing block manager,
> maybe register to the local shuffle server in the way of retrying, then start
> the heart beating to driver ...
> The task allocated here may fail if the executor fails to start or cannot get
> heart beating to the driver in time.
> Sometimes, even worse, when dynamic allocation and blacklisting is enabled
> and when the runtime executor number down to min executor setting, and those
> executors receive tasks before fully constructed and if any error happens,
> the application may be blocked or tear down.
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]