Anirudh Ramanathan commented on SPARK-23485:

Stavros - we [do currently 
 between kubernetes causing an executor to disappear (node failure) and exit 
caused by the application itself. 

Here's some detail on node issues and k8s:

The node level problem detection is split between the Kubelet and the [Node 
Problem Detector|https://github.com/kubernetes/node-problem-detector]. This 
works for some common errors and in future, will taint nodes upon detecting 
them. Some of these errors are listed 
 However, there are some categories of errors this setup won't detect. For 
example: if we have a node that has firewall rules/networking that prevents it 
from accessing a particular external service, to say - download/stream data. 
Or, a node with issues in its local disk which makes it throw read/write 
errors. These error conditions may only affect certain kinds of pods on that 
node and not others.

Yinan's point I think is that it is uncommon for applications on k8s to try and 
incorporate reasoning about node level conditions. I think this is because the 
general expectation is that a failure on a given node will just cause new 
executors to spin up on different nodes and eventually the application will 
succeed. However, I can see this being an issue in large-scale production 
deployments, where we'd see transient errors like above. Given the existence of 
a blacklist mechanism and anti-affinity primitives, it wouldn't be too complex 
to incorporate it I think. 

[~aash] [~mcheah], have you guys seen this in practice thus far? 

> Kubernetes should support node blacklist
> ----------------------------------------
>                 Key: SPARK-23485
>                 URL: https://issues.apache.org/jira/browse/SPARK-23485
>             Project: Spark
>          Issue Type: New Feature
>          Components: Kubernetes, Scheduler
>    Affects Versions: 2.3.0
>            Reporter: Imran Rashid
>            Priority: Major
> Spark's BlacklistTracker maintains a list of "bad nodes" which it will not 
> use for running tasks (eg., because of bad hardware).  When running in yarn, 
> this blacklist is used to avoid ever allocating resources on blacklisted 
> nodes: 
> https://github.com/apache/spark/blob/e836c27ce011ca9aef822bef6320b4a7059ec343/resource-managers/yarn/src/main/scala/org/apache/spark/scheduler/cluster/YarnSchedulerBackend.scala#L128
> I'm just beginning to poke around the kubernetes code, so apologies if this 
> is incorrect -- but I didn't see any references to 
> {{scheduler.nodeBlacklist()}} in {{KubernetesClusterSchedulerBackend}} so it 
> seems this is missing.  Thought of this while looking at SPARK-19755, a 
> similar issue on mesos.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to