[ 
https://issues.apache.org/jira/browse/SPARK-19941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15932088#comment-15932088
 ] 

Saisai Shao commented on SPARK-19941:
-------------------------------------

I think this scenario is quite similar to container preemption. In container 
preemption scenario, AM can be informed from RM which containers will be 
preempted in the next 15 seconds (by default), and AM could react based on such 
information.

I made a similar PR to avoid scheduling tasks on the executors going to be 
preempted. But finally it got rejected because the main reason is that let to 
be preempted executors idle for 15 seconds is too long and waste the resources. 
In your description the executors will be idle for 60 seconds before 
decommission, so this will really waste the resource if most of the works could 
be done in 1 minutes on this executors.

Also I'm not sure why the job will be hang as you mentioned before. I think the 
failed tasks will be rerun again.

So IMHO I think it is better not to handle this scenario unless there's some 
bad problems we met. Sometimes the effort of rerun tasks is smaller than 
wasting the resources.

> Spark should not schedule tasks on executors on decommissioning YARN nodes
> --------------------------------------------------------------------------
>
>                 Key: SPARK-19941
>                 URL: https://issues.apache.org/jira/browse/SPARK-19941
>             Project: Spark
>          Issue Type: Improvement
>          Components: Scheduler, YARN
>    Affects Versions: 2.1.0
>         Environment: Hadoop 2.8.0-rc1
>            Reporter: Karthik Palaniappan
>
> Hadoop 2.8 added a mechanism to gracefully decommission Node Managers in 
> YARN: https://issues.apache.org/jira/browse/YARN-914
> Essentially you can mark nodes to be decommissioned, and let them a) finish 
> work in progress and b) finish serving shuffle data. But no new work will be 
> scheduled on the node.
> Spark should respect when NMs are set to decommissioned, and similarly 
> decommission executors on those nodes by not scheduling any more tasks on 
> them.
> It looks like in the future YARN may inform the app master when containers 
> will be killed: https://issues.apache.org/jira/browse/YARN-3784. However, I 
> don't think Spark should schedule based on a timeout. We should gracefully 
> decommission the executor as fast as possible (which is the spirit of 
> YARN-914). The app master can query the RM for NM statuses (if it doesn't 
> already have them) and stop scheduling on executors on NMs that are 
> decommissioning.
> Stretch feature: The timeout may be useful in determining whether running 
> further tasks on the executor is even helpful. Spark may be able to tell that 
> shuffle data will not be consumed by the time the node is decommissioned, so 
> it is not worth computing. The executor can be killed immediately.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to