[ 
https://issues.apache.org/jira/browse/SPARK-23485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16374708#comment-16374708
 ] 

Yinan Li commented on SPARK-23485:
----------------------------------

In the Yarn case, yes, it's possible that a node is missing a jar commonly 
needed by applications. In the Kubernetes mode, this will never be the case 
because containers either all have a particular jar locally or none of them has 
it. An image missing a dependency is problematic by itself. This consistency is 
one of the benefit of being containerized. Talking about node problems, 
detecting node problems and avoid scheduling pods onto problematic nodes are 
the concerns of the kubelets and the scheduler. Applications should not need to 
worry about if nodes are healthy or not. Node problems happening at runtime 
cause pods to be evicted from the problematic nodes and rescheduled somewhere 
else. Having applications be responsible for keeping track of problematic nodes 
and maintain a blacklist means unnecessarily jumping into the business of 
kubelets and the scheduler.

 

[~foxish]

> Kubernetes should support node blacklist
> ----------------------------------------
>
>                 Key: SPARK-23485
>                 URL: https://issues.apache.org/jira/browse/SPARK-23485
>             Project: Spark
>          Issue Type: New Feature
>          Components: Kubernetes, Scheduler
>    Affects Versions: 2.3.0
>            Reporter: Imran Rashid
>            Priority: Major
>
> Spark's BlacklistTracker maintains a list of "bad nodes" which it will not 
> use for running tasks (eg., because of bad hardware).  When running in yarn, 
> this blacklist is used to avoid ever allocating resources on blacklisted 
> nodes: 
> https://github.com/apache/spark/blob/e836c27ce011ca9aef822bef6320b4a7059ec343/resource-managers/yarn/src/main/scala/org/apache/spark/scheduler/cluster/YarnSchedulerBackend.scala#L128
> I'm just beginning to poke around the kubernetes code, so apologies if this 
> is incorrect -- but I didn't see any references to 
> {{scheduler.nodeBlacklist()}} in {{KubernetesClusterSchedulerBackend}} so it 
> seems this is missing.  Thought of this while looking at SPARK-19755, a 
> similar issue on mesos.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to