[ 
https://issues.apache.org/jira/browse/SPARK-10790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14906969#comment-14906969
 ] 

Saisai Shao commented on SPARK-10790:
-------------------------------------

Hi [~jonathak], I think I reproduced the problem you mentioned, I don't think 
it is the cause of SPARK-7699, in fact it is introduced by SPARK-9092, which 
does not consider if initial executor is not set. 

Also what you mentioned in the mail about:

{quote}
Then on this line, it returns numExecutorsTarget (1) - oldNumExecutorsTarget 
(still 1, even though there aren't any executors running yet) = 0, for the 
number of executors it should request. Then the app hangs forever because it 
never requests any executors.
{quote}

I think you might be a little misunderstand the code, here return value 0 is of 
no use at anywhere, if the current target executor is equal to old target 
executor, Spark will simply not send resource request to AM. If current target 
executor is less than old target executor, Spark will update the resource 
requests to AM to try to ramp down some pending container requests. You might 
be dig more into the code :).

Yeah, it is actually a bug, let me fix it.

> Dynamic Allocation does not request any executors if first stage needs less 
> than or equal to spark.dynamicAllocation.initialExecutors
> -------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-10790
>                 URL: https://issues.apache.org/jira/browse/SPARK-10790
>             Project: Spark
>          Issue Type: Bug
>          Components: Scheduler
>    Affects Versions: 1.5.0
>            Reporter: Jonathan Kelly
>
> If you set spark.dynamicAllocation.initialExecutors > 0 (or 
> spark.dynamicAllocation.minExecutors, since 
> spark.dynamicAllocation.initialExecutors defaults to 
> spark.dynamicAllocation.minExecutors), and the number of tasks in the first 
> stage of your job is less than or equal to this min/init number of executors, 
> dynamic allocation won't actually request any executors and will just hang 
> indefinitely with the warning "Initial job has not accepted any resources; 
> check your cluster UI to ensure that workers are registered and have 
> sufficient resources".
> The cause appears to be that ExecutorAllocationManager does not request any 
> executors while the application is still initializing, but it still sets the 
> initial value of numExecutorsTarget to 
> spark.dynamicAllocation.initialExecutors. Once the job is running and has 
> submitted its first task, if the first task does not need more than 
> spark.dynamicAllocation.initialExecutors, 
> ExecutorAllocationManager.updateAndSyncNumExecutorsTarget() does not think 
> that it needs to request any executors, so it doesn't.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to