[ https://issues.apache.org/jira/browse/SPARK-21040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16974162#comment-16974162 ]
Prakhar Jain commented on SPARK-21040: -------------------------------------- Hi [~holden], At Microsoft, we are also facing same issues while adding support for low-priority VMs and we are working on similar lines. We have considered following options: Option 1) Whenever an executor goes to decommissioning state, we can consider all the tasks that are running on that executor for speculation (without worrying about "spark.speculation.quantile" or "spark.speculation.multiplier") Option 2) Whenever an executor goes to decommissioning state, Check the following for each task running on that executor - Check if X% of tasks have finished in the corresponding stage and identify the median time - if (MedianTime - RunTimeOfTaskInConsideration) > cloud_threshold then consider the task for speculation. cloud_threshold can be set as a configuration parameter (Ex. 120 seconds for aws spot instances etc) What are your thoughts on the same? > On executor/worker decommission consider speculatively re-launching current > tasks > --------------------------------------------------------------------------------- > > Key: SPARK-21040 > URL: https://issues.apache.org/jira/browse/SPARK-21040 > Project: Spark > Issue Type: Sub-task > Components: Spark Core > Affects Versions: 3.0.0 > Reporter: Holden Karau > Priority: Major > > If speculative execution is enabled we may wish to consider decommissioning > of worker as a weight for speculative execution. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org