[ https://issues.apache.org/jira/browse/SPARK-33737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dongjoon Hyun updated SPARK-33737: ---------------------------------- Affects Version/s: (was: 3.0.2) 3.2.0 > Use an Informer+Lister API in the ExecutorPodWatcher > ---------------------------------------------------- > > Key: SPARK-33737 > URL: https://issues.apache.org/jira/browse/SPARK-33737 > Project: Spark > Issue Type: Improvement > Components: Kubernetes > Affects Versions: 3.2.0 > Reporter: Stavros Kontopoulos > Priority: Major > > Kubernetes backend uses Fabric8 client and a > [watch|https://github.com/apache/spark/blob/master/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsWatchSnapshotSource.scala#L42] > to monitor the K8s Api server for pod changes. Every watcher keeps a > websocket connection open and has no caching mechanism at that part. Caching > at the Spark K8s resource manager exists in other areas where we are hitting > the Api Server for Pod CRUD ops like > [here|https://github.com/apache/spark/blob/b8ccd755244d3cd8a81a9f4a1eafa2a4e48759d2/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsLifecycleManager.scala#L49]. > In an env where a lot of connections are kept due to large scale jobs this > could be problematic and impose a lot of load against the API server. A lot > of long running jobs should not create pod changes eg. Streaming jobs to > justify a continuous watching mechanism. > Latest Frabric8 client versions have implemented a SharedInformer API+Lister, > an example can be found > [here|https://github.com/fabric8io/kubernetes-client/blob/master/kubernetes-examples/src/main/java/io/fabric8/kubernetes/examples/InformerExample.java#L37]. > This new API follows the implementation of the official java K8s client and > the go counterpart and it is backed up by a caching mechanism which is > re-synced after a configurable period to avoid hitting the API server all the > time. There is also a lister that keeps track of current status of resources. > Using such a mechanism is common place when implementing a K8s controller. > The suggestion is to update to v4.13.0 the client (has all updates in wrt > that API) and use the informer+lister API where applicable. > I think the lister could also replace part of the snapshotting/notification > mechanism. > /cc [~dongjoon] [~eje] [~holden] WDYTH? > -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org