[
https://issues.apache.org/jira/browse/SPARK-20079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Guoqiang Li updated SPARK-20079:
--------------------------------
Description:
The ExecutorAllocationManager.reset method is called when re-registering AM,
which sets the ExecutorAllocationManager.initializing field true. When this
field is true, the Driver does not start a new executor from the AM request.
The following two cases will cause the field to False
1. A executor idle for some time.
2. There are new stages to be submitted
After the a stage was submitted, the AM was killed and restart ,the above two
cases will not appear.
1. When AM is killed, the yarn will kill all running containers. All execuotr
will be lost and no executor will be idle.
2. No surviving executor, resulting in the current stage will never be
completed, DAG will not submit a new stage.
Reproduction steps:
1. Start cluster
{noformat}
echo -e "sc.parallelize(1 to 2000).foreach(_ => Thread.sleep(1000))" |
./bin/spark-shell --master yarn-client --executor-cores 1 --conf
spark.shuffle.service.enabled=true --conf spark.dynamicAllocation.enabled=true
--conf spark.dynamicAllocation.maxExecutors=2
{noformat}
2. Kill the AM process when a stage is scheduled.
was:
1. Start cluster
echo -e "sc.parallelize(1 to 2000).foreach(_ => Thread.sleep(1000))" |
./bin/spark-shell --master yarn-client --executor-cores 1 --conf
spark.shuffle.service.enabled=true --conf spark.dynamicAllocation.enabled=true
--conf spark.dynamicAllocation.maxExecutors=2
2. Kill the AM process when a stage is scheduled.
> Re registration of AM hangs spark cluster in yarn-client mode
> -------------------------------------------------------------
>
> Key: SPARK-20079
> URL: https://issues.apache.org/jira/browse/SPARK-20079
> Project: Spark
> Issue Type: Bug
> Components: YARN
> Affects Versions: 2.1.0
> Reporter: Guoqiang Li
>
> The ExecutorAllocationManager.reset method is called when re-registering AM,
> which sets the ExecutorAllocationManager.initializing field true. When this
> field is true, the Driver does not start a new executor from the AM request.
> The following two cases will cause the field to False
> 1. A executor idle for some time.
> 2. There are new stages to be submitted
> After the a stage was submitted, the AM was killed and restart ,the above two
> cases will not appear.
> 1. When AM is killed, the yarn will kill all running containers. All execuotr
> will be lost and no executor will be idle.
> 2. No surviving executor, resulting in the current stage will never be
> completed, DAG will not submit a new stage.
> Reproduction steps:
> 1. Start cluster
> {noformat}
> echo -e "sc.parallelize(1 to 2000).foreach(_ => Thread.sleep(1000))" |
> ./bin/spark-shell --master yarn-client --executor-cores 1 --conf
> spark.shuffle.service.enabled=true --conf
> spark.dynamicAllocation.enabled=true --conf
> spark.dynamicAllocation.maxExecutors=2
> {noformat}
> 2. Kill the AM process when a stage is scheduled.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]