[ https://issues.apache.org/jira/browse/SPARK-15815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15333198#comment-15333198 ]
SuYan commented on SPARK-15815: ------------------------------- eh...yes, still have the uncertainty to got another executors, how can we wait some time to decide to abort the tasksets...like 5 min, or 10 min. actually, For me the primary task is to make sure the job can finished successful instead of failed and give up the sunk cost, so I prefer to reset the condition to make job hang, like make dynamic active again, or kill the blacklist Executor and request new, or wait executor to be allocated even if need to wait some time due to resource shortage. > Hang while enable blacklistExecutor and DynamicExecutorAllocator > ----------------------------------------------------------------- > > Key: SPARK-15815 > URL: https://issues.apache.org/jira/browse/SPARK-15815 > Project: Spark > Issue Type: Bug > Components: Scheduler, Spark Core > Affects Versions: 1.6.1 > Reporter: SuYan > Priority: Minor > > Enable BlacklistExecutor with some time large than 120s and enabled > DynamicAllocate with minExecutors = 0 > 1. Assume there only left 1 task running in Executor A, and other Executor > are all timeout. > 2. the task failed, so task will not scheduled in current Executor A due to > enable blacklistTime. > 3. For ExecutorAllocateManager, it always request targetNumExecutor=1 > executors, due to we already have executor A, so the oldTargetNumExecutor == > targetNumExecutor = 1, so will never add more Executors...even if Executor A > was timeout. it became endless request delta=0 executors. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org