[ 
https://issues.apache.org/jira/browse/SPARK-21040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16974162#comment-16974162
 ] 

Prakhar Jain commented on SPARK-21040:
--------------------------------------

Hi [~holden], At Microsoft, we are also facing same issues while adding support 
for low-priority VMs and we are working on similar lines.

We have considered following options:
Option 1) Whenever an executor goes to decommissioning state, we can consider 
all the tasks that are running on that executor for speculation (without 
worrying about "spark.speculation.quantile" or "spark.speculation.multiplier")

Option 2) Whenever an executor goes to decommissioning state, Check the 
following for each task running on that executor

  - Check if X% of tasks have finished in the corresponding stage and identify 
the median time
  - if (MedianTime - RunTimeOfTaskInConsideration) > cloud_threshold then 
consider the task for speculation. cloud_threshold can be set as a 
configuration parameter (Ex. 120 seconds for aws spot instances etc)


What are your thoughts on the same?

> On executor/worker decommission consider speculatively re-launching current 
> tasks
> ---------------------------------------------------------------------------------
>
>                 Key: SPARK-21040
>                 URL: https://issues.apache.org/jira/browse/SPARK-21040
>             Project: Spark
>          Issue Type: Sub-task
>          Components: Spark Core
>    Affects Versions: 3.0.0
>            Reporter: Holden Karau
>            Priority: Major
>
> If speculative execution is enabled we may wish to consider decommissioning 
> of worker as a weight for speculative execution.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to