[ 
https://issues.apache.org/jira/browse/SPARK-33737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-33737:
----------------------------------
    Affects Version/s:     (was: 3.0.2)
                       3.2.0

> Use an Informer+Lister API in the ExecutorPodWatcher
> ----------------------------------------------------
>
>                 Key: SPARK-33737
>                 URL: https://issues.apache.org/jira/browse/SPARK-33737
>             Project: Spark
>          Issue Type: Improvement
>          Components: Kubernetes
>    Affects Versions: 3.2.0
>            Reporter: Stavros Kontopoulos
>            Priority: Major
>
> Kubernetes backend uses Fabric8 client and a 
> [watch|https://github.com/apache/spark/blob/master/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsWatchSnapshotSource.scala#L42]
>  to monitor the K8s Api server for pod changes. Every watcher keeps a 
> websocket connection open and has no caching mechanism at that part. Caching 
> at the Spark K8s resource manager exists in other areas where we are hitting 
> the Api Server for Pod CRUD ops like 
> [here|https://github.com/apache/spark/blob/b8ccd755244d3cd8a81a9f4a1eafa2a4e48759d2/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsLifecycleManager.scala#L49].
> In an env where a lot of connections are kept due to large scale jobs this 
> could be problematic and impose a lot of load against the API server. A lot 
> of long running jobs should not create pod changes eg. Streaming jobs to 
> justify a continuous watching mechanism.
> Latest Frabric8 client versions have implemented a SharedInformer API+Lister, 
> an example can be found 
> [here|https://github.com/fabric8io/kubernetes-client/blob/master/kubernetes-examples/src/main/java/io/fabric8/kubernetes/examples/InformerExample.java#L37].
> This new API follows the implementation of the official java K8s client and 
> the go counterpart and it is backed up by a caching mechanism which is 
> re-synced after a configurable period to avoid hitting the API server all the 
> time. There is also a lister that keeps track of current status of resources. 
> Using such a mechanism is common place when implementing a K8s controller.
> The suggestion is to update to v4.13.0 the client (has all updates in wrt 
> that API) and use the informer+lister API where applicable. 
> I think the lister could also replace part of the snapshotting/notification 
> mechanism.
> /cc [~dongjoon] [~eje] [~holden] WDYTH?
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to