[
https://issues.apache.org/jira/browse/SPARK-24135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16462139#comment-16462139
]
Anirudh Ramanathan commented on SPARK-24135:
--------------------------------------------
It is increasingly common for people to write custom controllers and custom
resources and not use the built-in controllers, especially when the workloads
have special characteristics. This is the whole reason why people are working
on tooling like the [operator
framework|https://coreos.com/blog/introducing-operator-framework]. I don't
think the future lies in shoehorning applications to use the existing
controllers. The existing controllers are a good starting point but for any
custom orchestration, the recommendation from the k8s community at large would
be to write an operator which in some sense is what we've done. So, I think
moving towards the built-in controllers doesn't give us anything more.
Also, replication controllers and deployments are not used for applications
with termination semantics. They're suitable for long running services. The
only "batch" type built-in controller is the [job
controller|https://kubernetes.io/docs/concepts/workloads/controllers/jobs-run-to-completion],
which does implement a backoff policy that covers the initialization and
runtime errors in containers. As I see it, we should have safe limits for all
kinds of failures to eventually give up; it's more a question of whether this
should be treated differently as a framework error.
Also, flakiness due to admission webhooks seems like it should be handled by
retries in the init container, or by some other automation, since it's outside
Spark land. That makes me apprehensive about handling such specific cases
within Spark, instead of dealing with it as "framework error" and "app error".
> [K8s] Executors that fail to start up because of init-container errors are
> not retried and limit the executor pool size
> -----------------------------------------------------------------------------------------------------------------------
>
> Key: SPARK-24135
> URL: https://issues.apache.org/jira/browse/SPARK-24135
> Project: Spark
> Issue Type: Bug
> Components: Kubernetes
> Affects Versions: 2.3.0
> Reporter: Matt Cheah
> Priority: Major
>
> In KubernetesClusterSchedulerBackend, we detect if executors disconnect after
> having been started or if executors hit the {{ERROR}} or {{DELETED}} states.
> When executors fail in these ways, they are removed from the pending
> executors pool and the driver should retry requesting these executors.
> However, the driver does not handle a different class of error: when the pod
> enters the {{Init:Error}} state. This state comes up when the executor fails
> to launch because one of its init-containers fails. Spark itself doesn't
> attach any init-containers to the executors. However, custom web hooks can
> run on the cluster and attach init-containers to the executor pods.
> Additionally, pod presets can specify init containers to run on these pods.
> Therefore Spark should be handling the {{Init:Error}} cases regardless if
> Spark itself is aware of init-containers or not.
> This class of error is particularly bad because when we hit this state, the
> failed executor will never start, but it's still seen as pending by the
> executor allocator. The executor allocator won't request more rounds of
> executors because its current batch hasn't been resolved to either running or
> failed. Therefore we end up with being stuck with the number of executors
> that successfully started before the faulty one failed to start, potentially
> creating a fake resource bottleneck.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]