Hey all,
Thanks in advance. I ran into a situation where spark driver reduced the
total executors count for my job even with dynamic allocation disabled, and
caused the job to hang for ever.
Setup:
Spark-1.3.1 on hadoop-yarn-2.4.0 cluster.
All servers in cluster running Linux version 2.6.32.
Job in yarn-client mode.
Scenario:
1. Application running with required number of executors.
2. One of the DN's losses connectivity and is timed out.
2. Spark issues a killExecutor for the executor on the DN which was timed
out.
3. Even with dynamic allocation off, spark's driver reduces the
"targetNumExecutors".
On analysing the code (Spark 1.3.1):
When my DN goes unreachable:
Spark core's HeartbeatReceiver invokes expireDeadHosts(): which checks if
Dynamic Allocation is supported and then invokes "sc.killExecutor()"
/if (sc.supportDynamicAllocation) {
sc.killExecutor(executorId)
}/
Surprisingly supportDynamicAllocation in sparkContext.scala is defined as,
resulting "True" if dynamicAllocationTesting flag is enabled or spark is
running over "yarn".
/private[spark] def supportDynamicAllocation =
master.contains("yarn") || dynamicAllocationTesting /
"sc.killExecutor()" matches it to configured "schedulerBackend"
(CoarseGrainedSchedulerBackend in this case) and invokes
"killExecutors(executorIds)"
CoarseGrainedSchedulerBackend calculates a "newTotal" for the total number
of executors required, and sends a update to application master by invoking
"doRequestTotalExecutors(newTotal)"
CoarseGrainedSchedulerBackend then invokes a
"doKillExecutors(filteredExecutorIds)" for the lost executors.
Thus reducing the total number of executors in a host intermittently
unreachable scenario.
I noticed that this change to "CoarseGrainedSchedulerBackend" was introduced
while fixing : https://issues.apache.org/jira/browse/SPARK-6325
<https://issues.apache.org/jira/browse/SPARK-6325>
I am new to this code, If any of you could comment on why do we need
"doRequestTotalExecutors" in "killExecutors" would be a great help. Also why
do we have "supportDynamicAllocation" = True even if i have not enabled
dynamic allocation.
Regards,
Prakhar.
--
View this message in context:
http://apache-spark-developers-list.1001551.n3.nabble.com/Spark-driver-reducing-total-executors-count-even-when-Dynamic-Allocation-is-disabled-tp14679.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]