[ https://issues.apache.org/jira/browse/SPARK-10790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14906787#comment-14906787 ]
Saisai Shao commented on SPARK-10790: ------------------------------------- Hi [~jonathak], let me trying to understand your scenario: 1. In your Spark cluster you have dynamic allocation enabled with minimum and initial number of executors set, for example: spark.dynamicAllocation.minExecutors 2 spark.dynamicAllocation.initialExecutors 3 2. You run a Spark job with resource requirements that is enough using current executors (don't need to request new executors), for example: sc.parallelize(1 to 100, 1).collect() Here this job will only have ONE task, so the current 2 executors (with 2 cores for each executors) can satisfy the resource requirement and no need to request new executors. Is that the scenario you described? I assume it is right. I tested locally in my environment, seems no such "hang" or "Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources" as you mentioned. 1. Take this as example, Spark already has 2 executors with 4 cores, so submitting jobs will not occur "Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources" problem, since resource is enough. 2. "ExecutorAllocationManager.updateAndSyncNumExecutorsTarget() does not think that it needs to request any executors" is expected, since your current resource is enough. 3. "ExecutorAllocationManager does not request any executors while the application is still initializing". The initializing state will be finished once you submitted a job. So here when you submit a job, actually ExecutorAllocationManager's internal state is not yet initializing, so it can bring up and ramp down executors according to load. I'm not sure is this exactly your scenario, basically I cannot reproduce your problem, can you describe more specifically? > Dynamic Allocation does not request any executors if first stage needs less > than or equal to spark.dynamicAllocation.initialExecutors > ------------------------------------------------------------------------------------------------------------------------------------- > > Key: SPARK-10790 > URL: https://issues.apache.org/jira/browse/SPARK-10790 > Project: Spark > Issue Type: Bug > Components: Scheduler > Affects Versions: 1.5.0 > Reporter: Jonathan Kelly > > If you set spark.dynamicAllocation.initialExecutors > 0 (or > spark.dynamicAllocation.minExecutors, since > spark.dynamicAllocation.initialExecutors defaults to > spark.dynamicAllocation.minExecutors), and the number of tasks in the first > stage of your job is less than or equal to this min/init number of executors, > dynamic allocation won't actually request any executors and will just hang > indefinitely with the warning "Initial job has not accepted any resources; > check your cluster UI to ensure that workers are registered and have > sufficient resources". > The cause appears to be that ExecutorAllocationManager does not request any > executors while the application is still initializing, but it still sets the > initial value of numExecutorsTarget to > spark.dynamicAllocation.initialExecutors. Once the job is running and has > submitted its first task, if the first task does not need more than > spark.dynamicAllocation.initialExecutors, > ExecutorAllocationManager.updateAndSyncNumExecutorsTarget() does not think > that it needs to request any executors, so it doesn't. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org