[
https://issues.apache.org/jira/browse/SPARK-19438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sean Owen resolved SPARK-19438.
-------------------------------
Resolution: Not A Problem
> executorDataMap should be guarded by
> CoarseGrainedSchedulerBackend.this.synchronized
> -------------------------------------------------------------------------------------
>
> Key: SPARK-19438
> URL: https://issues.apache.org/jira/browse/SPARK-19438
> Project: Spark
> Issue Type: Bug
> Components: Spark Core
> Affects Versions: 2.1.0
> Reporter: jin xing
>
> Currently when handle *RegisterExecutor* in *CoarseGrainedSchedulerBackend*,
> *executorDataMap* is guarded by
> *CoarseGrainedSchedulerBackend.this.synchronized* when updating, which can
> cause *numPendingExecutors* incorrect.
> Code is like below:
> {code}
> if (executorDataMap.contains(executorId)) {
> executorRef.send(RegisterExecutorFailed("Duplicate executor ID: " +
> executorId))
> context.reply(true)
> } else {
> ...
> CoarseGrainedSchedulerBackend.this.synchronized {
> executorDataMap.put(executorId, data)
> if (currentExecutorIdCounter < executorId.toInt) {
> currentExecutorIdCounter = executorId.toInt
> }
> if (numPendingExecutors > 0) {
> numPendingExecutors -= 1
> logDebug(s"Decremented number of pending executors
> ($numPendingExecutors left)")
> }
> }
> {code}
> Consider SPARK-19437 and a scenario like below:
> An executor sent *RegisterExecutor* twice by *askWithRetry*, and the interval
> between the two is quite small. Thus it might be possible that both of them
> will go to *else* branch, thus *numPendingExecutors* will be deducted twice.
> Currently, the *askWithRetry* of *RegisterExecutor* only exists in some unit
> tests, but it makes sense to make it stronger when handling
> *RegisterExecutor*.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]