This is an automated email from the ASF dual-hosted git repository.
dongjoon pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/spark-kubernetes-operator.git
The following commit(s) were added to refs/heads/main by this push:
new df9d8ac [SPARK-55634] Use `Mermaid` for `(Application|Cluster) State
Transition`
df9d8ac is described below
commit df9d8ac848018744e373e416d3513ee19eeb2263
Author: Dongjoon Hyun <[email protected]>
AuthorDate: Mon Feb 23 07:16:31 2026 -0800
[SPARK-55634] Use `Mermaid` for `(Application|Cluster) State Transition`
### What changes were proposed in this pull request?
This PR aims to use `Mermaid` markdown for `(Application|Cluster) State
Transition` diagram instead of `PNG` files.
### Why are the changes needed?
- This PR fixes the outdated PNG files.
- `Succeeded` SparkApp is not supposed to be restarted while our diagram
shows incorrectly like that's one of feasible path.
https://github.com/apache/spark-kubernetes-operator/blob/f5cebb332e2f30c95c9c0ead9613e3334316d8e8/spark-operator-api/src/test/java/org/apache/spark/k8s/operator/spec/RestartPolicyTest.java#L64
<p align="center">
<img width="279" height="362" alt="Screenshot 2026-02-22 at 17 12 14"
src="https://github.com/user-attachments/assets/660e45d0-b20f-4279-9395-22c60a73bfbd"
/>
</p>
- `RunningHealth` SparkCluster is able to go `ResourceReleased` directly
while our diagram misses the link.
<p align="center">
<img width="343" height="175" alt="Screenshot 2026-02-22 at 17 20 20"
src="https://github.com/user-attachments/assets/e6cbdd5d-2a83-4c92-bbb5-276c799ad696"
/>
</p>
- `Mermaid` Markdown code is editable and traceable unlike `PNG` files. We
can update this according to your code change timely.
### Does this PR introduce _any_ user-facing change?
No behavior change.
### How was this patch tested?
Manual review.
**BEFORE**
-
https://github.com/apache/spark-kubernetes-operator/blob/main/docs/architecture.md
**AFTER**
-
https://github.com/dongjoon-hyun/spark-kubernetes-operator/blob/SPARK-55634/docs/architecture.md
### Was this patch authored or co-authored using generative AI tooling?
Generated-by: `Gemini 3.1 Pro (High)` on `Antigravity`
Closes #516 from dongjoon-hyun/SPARK-55634.
Authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
---
docs/architecture.md | 70 ++++++++++++++++++++++++++-
docs/resources/application_state_machine.png | Bin 82299 -> 0 bytes
docs/resources/cluster_state_machine.png | Bin 15835 -> 0 bytes
3 files changed, 68 insertions(+), 2 deletions(-)
diff --git a/docs/architecture.md b/docs/architecture.md
index 477a0f8..e359c1f 100644
--- a/docs/architecture.md
+++ b/docs/architecture.md
@@ -48,7 +48,58 @@ launching Spark deployments and submitting jobs under the
hood. It also uses
## Application State Transition
-[](resources/application_state_machine.png)
+```mermaid
+stateDiagram-v2
+
+ [*] --> Submitted
+
+ Submitted --> DriverRequested
+ Submitted --> SchedulingFailure
+
+ ScheduledToRestart --> DriverRequested
+
+ DriverRequested --> DriverStarted
+ DriverRequested --> DriverStartTimedOut
+
+ DriverStarted --> DriverReady
+ DriverStarted --> DriverReadyTimedOut
+ DriverStarted --> DriverEvicted
+
+ DriverReady --> RunningHealthy
+ DriverReady --> InitializedBelowThresholdExecutors
+ DriverReady --> ExecutorsStartTimedOut
+ DriverReady --> DriverEvicted
+
+ InitializedBelowThresholdExecutors --> RunningHealthy
+ InitializedBelowThresholdExecutors --> Failed
+
+ RunningHealthy --> Succeeded
+ RunningHealthy --> RunningWithBelowThresholdExecutors
+ RunningHealthy --> Failed
+
+ RunningWithBelowThresholdExecutors --> RunningHealthy
+ RunningWithBelowThresholdExecutors --> Failed
+
+ state Failures {
+ SchedulingFailure
+ DriverStartTimedOut
+ DriverReadyTimedOut
+ ExecutorsStartTimedOut
+ DriverEvicted
+ Failed
+ }
+
+ Failures --> ScheduledToRestart : Retry Configured
+ Failures --> ResourceReleased : Terminated
+
+ Succeeded --> ResourceReleased
+ ResourceReleased --> [*]
+
+ %% Place TerminatedWithoutReleaseResources further to avoid overlap
+ Failures --> TerminatedWithoutReleaseResources : Retain Policy
+ Succeeded --> TerminatedWithoutReleaseResources
+ TerminatedWithoutReleaseResources --> [*]
+```
* Spark applications are expected to run from submitted to succeeded before
releasing resources
* User may configure the app CR to time-out after given threshold of time if
it cannot reach healthy
@@ -74,7 +125,22 @@ launching Spark deployments and submitting jobs under the
hood. It also uses
## Cluster State Transition
-[](resources/cluster_state_machine.png)
+```mermaid
+stateDiagram-v2
+
+ [*] --> Submitted
+
+ Submitted --> RunningHealthy
+ Submitted --> SchedulingFailure
+
+ RunningHealthy --> Failed
+ RunningHealthy --> ResourceReleased
+
+ SchedulingFailure --> ResourceReleased
+ Failed --> ResourceReleased
+
+ ResourceReleased --> [*]
+```
* Spark clusters are expected to be always running after submitted.
* Similar to Spark applications, K8s resources created for a cluster would be
deleted as the final
diff --git a/docs/resources/application_state_machine.png
b/docs/resources/application_state_machine.png
deleted file mode 100644
index 3b3df3d..0000000
Binary files a/docs/resources/application_state_machine.png and /dev/null differ
diff --git a/docs/resources/cluster_state_machine.png
b/docs/resources/cluster_state_machine.png
deleted file mode 100644
index 2a8dcdd..0000000
Binary files a/docs/resources/cluster_state_machine.png and /dev/null differ
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]