This is an automated email from the ASF dual-hosted git repository. feiwang pushed a commit to branch branch-1.10 in repository https://gitbox.apache.org/repos/asf/kyuubi.git
The following commit(s) were added to refs/heads/branch-1.10 by this push: new 7a1c750ac5 [KYUUBI #7025] [KYUUBI #6686][FOLLOWUP] Prefer terminated container app state than terminated pod state 7a1c750ac5 is described below commit 7a1c750ac513a9592c10420ea050aea27d964fa9 Author: Wang, Fei <fwan...@ebay.com> AuthorDate: Wed Apr 16 10:12:10 2025 -0700 [KYUUBI #7025] [KYUUBI #6686][FOLLOWUP] Prefer terminated container app state than terminated pod state ### Why are the changes needed? I found that, for a kyuubi batch on kubernetes. 1. It has been `FINISHED`. 2. then I delete the pod manually, then I check the k8s-audit.log, then the appState became `FAILED`. ``` 2025-04-15 11:16:30.453 INFO [-675216314-pool-44-thread-839] org.apache.kyuubi.engine.KubernetesApplicationAuditLogger: label=61e7d8c1-e5a9-46cd-83e7-c611003f0224 context=97 namespace=dls-prod pod=kyuubi-spark-61e7d8c1-e5a9-46cd-83e7-c611003f0224-driver podState=Running containers=[microvault->ContainerState(running=ContainerStateRunning(startedAt=2025-04-15T18:13:48Z, additionalProperties={}), terminated=null, waiting=null, additionalProperties={}),spark-kubernet [...] :2025-04-15 11:16:30.854 INFO [-675216314-pool-44-thread-840] org.apache.kyuubi.engine.KubernetesApplicationAuditLogger: label=61e7d8c1-e5a9-46cd-83e7-c611003f0224 context=97 namespace=dls-prod pod=kyuubi-spark-61e7d8c1-e5a9-46cd-83e7-c611003f0224-driver podState=Failed containers=[microvault->ContainerState(running=null, terminated=ContainerStateTerminated(containerID=containerd://91654e3ee74e2c31218e14be201b50a4a604c2ad15d3afd84dc6f620e59894b7, exitCode=2, finishedAt=2 [...] ``` This PR is a followup for #6690 , which ignore the container state if POD is terminated. It is more reasonable to respect the terminated container state than terminated pod state. ### How was this patch tested? Integration testing. ``` :2025-04-15 13:53:24.551 INFO [-1077768163-pool-36-thread-3] org.apache.kyuubi.engine.KubernetesApplicationAuditLogger: eventType=DELETE label=e0eb4580-3cfa-43bf-bdcc-efeabcabc93c context=97 namespace=dls-prod pod=kyuubi-spark-e0eb4580-3cfa-43bf-bdcc-efeabcabc93c-driver podState=Failed containers=[microvault->ContainerState(running=null, terminated=ContainerStateTerminated(containerID=containerd://66c42206730950bd422774e3c1b0f426d7879731788cea609bbfe0daab24a76 [...] ``` ### Was this patch authored or co-authored using generative AI tooling? No. Closes #7025 from turboFei/container_terminated. Closes #7025 Closes #6686 a3b2a5a56 [Wang, Fei] comments 4356d1bc9 [Wang, Fei] fix the app state logical Authored-by: Wang, Fei <fwan...@ebay.com> Signed-off-by: Wang, Fei <fwan...@ebay.com> (cherry picked from commit 7e199d6fdbdf52222bb3eadd056b9e5a2295f36e) Signed-off-by: Wang, Fei <fwan...@ebay.com> --- .../kyuubi/engine/KubernetesApplicationOperation.scala | 15 +++++++++------ 1 file changed, 9 insertions(+), 6 deletions(-) diff --git a/kyuubi-server/src/main/scala/org/apache/kyuubi/engine/KubernetesApplicationOperation.scala b/kyuubi-server/src/main/scala/org/apache/kyuubi/engine/KubernetesApplicationOperation.scala index c7ce750f2c..2e57c722f4 100644 --- a/kyuubi-server/src/main/scala/org/apache/kyuubi/engine/KubernetesApplicationOperation.scala +++ b/kyuubi-server/src/main/scala/org/apache/kyuubi/engine/KubernetesApplicationOperation.scala @@ -553,15 +553,18 @@ object KubernetesApplicationOperation extends Logging { } val podAppState = podStateToApplicationState(pod.getStatus.getPhase) - val containerAppState = containerStatusToBuildAppState + val containerAppStateOpt = containerStatusToBuildAppState .map(_.getState) .map(containerStateToApplicationState) - // When the pod app state is terminated, the container app state will be ignored - val applicationState = if (ApplicationState.isTerminated(podAppState)) { - podAppState - } else { - containerAppState.getOrElse(podAppState) + val applicationState = containerAppStateOpt match { + // for cases that spark container already terminated, but sidecar containers live + case Some(containerAppState) + if ApplicationState.isTerminated(containerAppState) => containerAppState + // we don't need to care about container state if pod is already terminated + case _ if ApplicationState.isTerminated(podAppState) => podAppState + case Some(containerAppState) => containerAppState + case None => podAppState } val applicationError = if (ApplicationState.isFailed(applicationState)) {