[ 
https://issues.apache.org/jira/browse/SPARK-23485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16374778#comment-16374778
 ] 

Stavros Kontopoulos edited comment on SPARK-23485 at 2/23/18 6:59 PM:
----------------------------------------------------------------------

How about locality preferences + a hardware problem, like the disk problem? I 
see code in Spark Kubernetes scheduler related to locality (not sure if it is 
completed). Will that problem be detected and will kubernetes scheduler 
consider the node as problematic? If so then I guess there is no need for 
blacklisting in such scenarios. If though, this cannot be detected and the task 
is failing but there is locality preference what will happen? Kubernetes should 
not just re-run things elsewhere just because there was a failure. The reason 
for a failure matters. Is that an app failure or something lower level. (I am 
new to kubernetes, so I missed the fact that taints is applied cluster wide, we 
just need some similar feature as already mentioned node anti-affinity).

 


was (Author: skonto):
How about locality preferences + a hardware problem, like the disk problem? I 
see code in Spark Kubernetes scheduler related to locality (not sure if it is 
completed). Will that problem be detected and will kubernetes scheduler 
consider the node as problematic? If so then I guess there is no need for 
blacklisting in such scenarios. If though, this cannot be detected and the task 
is failing but there is locality preference what will happen? Kubernetes should 
not just re-run things elsewhere just because there was a failure. The reason 
for a failure matters. Is that an app failure or something lower level. 

 

> Kubernetes should support node blacklist
> ----------------------------------------
>
>                 Key: SPARK-23485
>                 URL: https://issues.apache.org/jira/browse/SPARK-23485
>             Project: Spark
>          Issue Type: New Feature
>          Components: Kubernetes, Scheduler
>    Affects Versions: 2.3.0
>            Reporter: Imran Rashid
>            Priority: Major
>
> Spark's BlacklistTracker maintains a list of "bad nodes" which it will not 
> use for running tasks (eg., because of bad hardware).  When running in yarn, 
> this blacklist is used to avoid ever allocating resources on blacklisted 
> nodes: 
> https://github.com/apache/spark/blob/e836c27ce011ca9aef822bef6320b4a7059ec343/resource-managers/yarn/src/main/scala/org/apache/spark/scheduler/cluster/YarnSchedulerBackend.scala#L128
> I'm just beginning to poke around the kubernetes code, so apologies if this 
> is incorrect -- but I didn't see any references to 
> {{scheduler.nodeBlacklist()}} in {{KubernetesClusterSchedulerBackend}} so it 
> seems this is missing.  Thought of this while looking at SPARK-19755, a 
> similar issue on mesos.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to