[ https://issues.apache.org/jira/browse/SPARK-21656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16117204#comment-16117204 ]
Thomas Graves commented on SPARK-21656: --------------------------------------- Another option would be just to add logic for spark to look at some point to see if it should try reacquiring some. All of that though seems like more logic then just not letting them go. To me Spark needs to be more resilient about this and should handle various possible conditions. User shouldn't have to tune every single job to account for weird things happening. Note that if dynamic allocation is off this doesn't happen. So why is user getting worse experience in this case. > spark dynamic allocation should not idle timeout executors when tasks still > to run > ---------------------------------------------------------------------------------- > > Key: SPARK-21656 > URL: https://issues.apache.org/jira/browse/SPARK-21656 > Project: Spark > Issue Type: Bug > Components: Spark Core > Affects Versions: 2.1.1 > Reporter: Jong Yoon Lee > Fix For: 2.1.1 > > Original Estimate: 24h > Remaining Estimate: 24h > > Right now spark lets go of executors when they are idle for the 60s (or > configurable time). I have seen spark let them go when they are idle but they > were really needed. I have seen this issue when the scheduler was waiting to > get node locality but that takes longer then the default idle timeout. In > these jobs the number of executors goes down really small (less than 10) but > there are still like 80,000 tasks to run. > We should consider not allowing executors to idle timeout if they are still > needed according to the number of tasks to be run. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org