Marcelo Masiero Vanzin created SPARK-29905:
----------------------------------------------

             Summary: ExecutorPodsLifecycleManager has sub-optimal behavior 
with dynamic allocation
                 Key: SPARK-29905
                 URL: https://issues.apache.org/jira/browse/SPARK-29905
             Project: Spark
          Issue Type: Improvement
          Components: Kubernetes
    Affects Versions: 3.0.0
            Reporter: Marcelo Masiero Vanzin


I've been playing with dynamic allocation on k8s and noticed some weird 
behavior from ExecutorPodsLifecycleManager when it's on.

The cause of this behavior is mostly because of the higher rate of pod updates 
when you have dynamic allocation. Pods being created and going away all the 
time generate lots of events, that are then translated into "snapshots" 
internally in Spark, and fed to subscribers such as 
ExecutorPodsLifecycleManager.

The first effect of that is that you get a lot of spurious logging. Since 
snapshots are incremental, you can get lots of snapshots with the same 
"PodDeleted" information, for example, and ExecutorPodsLifecycleManager will 
log for all of them. Yes, log messages are at debug level, but if you're 
debugging that stuff, it's really noisy and distracting.

The second effect is that the same way you get multiple log messages, you end 
up calling into the Spark scheduler, and worse, into the K8S API server, 
multiple times for the same pod update. We can optimize that and reduce the 
chattiness with the API server.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to