[
https://issues.apache.org/jira/browse/SPARK-33737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dongjoon Hyun updated SPARK-33737:
----------------------------------
Affects Version/s: (was: 3.0.2)
3.2.0
> Use an Informer+Lister API in the ExecutorPodWatcher
> ----------------------------------------------------
>
> Key: SPARK-33737
> URL: https://issues.apache.org/jira/browse/SPARK-33737
> Project: Spark
> Issue Type: Improvement
> Components: Kubernetes
> Affects Versions: 3.2.0
> Reporter: Stavros Kontopoulos
> Priority: Major
>
> Kubernetes backend uses Fabric8 client and a
> [watch|https://github.com/apache/spark/blob/master/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsWatchSnapshotSource.scala#L42]
> to monitor the K8s Api server for pod changes. Every watcher keeps a
> websocket connection open and has no caching mechanism at that part. Caching
> at the Spark K8s resource manager exists in other areas where we are hitting
> the Api Server for Pod CRUD ops like
> [here|https://github.com/apache/spark/blob/b8ccd755244d3cd8a81a9f4a1eafa2a4e48759d2/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsLifecycleManager.scala#L49].
> In an env where a lot of connections are kept due to large scale jobs this
> could be problematic and impose a lot of load against the API server. A lot
> of long running jobs should not create pod changes eg. Streaming jobs to
> justify a continuous watching mechanism.
> Latest Frabric8 client versions have implemented a SharedInformer API+Lister,
> an example can be found
> [here|https://github.com/fabric8io/kubernetes-client/blob/master/kubernetes-examples/src/main/java/io/fabric8/kubernetes/examples/InformerExample.java#L37].
> This new API follows the implementation of the official java K8s client and
> the go counterpart and it is backed up by a caching mechanism which is
> re-synced after a configurable period to avoid hitting the API server all the
> time. There is also a lister that keeps track of current status of resources.
> Using such a mechanism is common place when implementing a K8s controller.
> The suggestion is to update to v4.13.0 the client (has all updates in wrt
> that API) and use the informer+lister API where applicable.
> I think the lister could also replace part of the snapshotting/notification
> mechanism.
> /cc [~dongjoon] [~eje] [~holden] WDYTH?
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]