Youngkwang (YK) Lee created SPARK-49061:
-------------------------------------------
Summary: Emit Kubernetes events when driver fails to request
executor
Key: SPARK-49061
URL: https://issues.apache.org/jira/browse/SPARK-49061
Project: Spark
Issue Type: Improvement
Components: Kubernetes
Affects Versions: 3.5.3
Reporter: Youngkwang (YK) Lee
In Kubernetes, when a driver pod fails to request executor pods (i.e due to
being out of resource quota), the only visibility around this issue is inside
the driver logs.
We would like to expose this issue as a Kubernetes driver event to enhance
debugging. A possible solution is to add event emission logic in
ExecutorPodsAllocator.scala when we fail to request executors:
[https://bbgithub.dev.bloomberg.com/dnaspark/apache-spark-internal/blob/develop-3.4/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala#L439-L463]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]