(spark-kubernetes-operator) branch main updated: [SPARK-49464] Add documentations

dongjoon Fri, 04 Oct 2024 16:34:43 -0700

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/spark-kubernetes-operator.git



The following commit(s) were added to refs/heads/main by this push:
     new de525e2  [SPARK-49464] Add documentations
de525e2 is described below

commit de525e2cd96f32d07f0a9541a28f6919e34360dc
Author: zhou-jiang <[email protected]>
AuthorDate: Fri Oct 4 16:34:29 2024 -0700

    [SPARK-49464] Add documentations
    
    ### What changes were proposed in this pull request?
    
    This PR includes Operator docs under `docs/` for configuration, 
architecture, operations, and metrics.
    
    ### Why are the changes needed?
    
    Operator docs are necessary for users to understand the design and getting 
started with the operator installation
    
    ### Does this PR introduce _any_ user-facing change?
    
    No - new release
    
    ### How was this patch tested?
    
    CIs
    
    ### Was this patch authored or co-authored using generative AI tooling?
    
    No
    
    Closes #113 from jiangzho/doc.
    
    Authored-by: zhou-jiang <[email protected]>
    Signed-off-by: Dongjoon Hyun <[email protected]>
---
 build-tools/docs-utils/build.gradle          |   2 +
 docs/architecture.md                         |  81 ++++++++++
 docs/config_properties.md                    |  40 +++++
 docs/configuration.md                        | 144 +++++++++++++++++
 docs/operations.md                           | 151 ++++++++++++++++++
 docs/resources/application_state_machine.png | Bin 0 -> 82299 bytes
 docs/resources/cluster_state_machine.png     | Bin 0 -> 15835 bytes
 docs/resources/prometheus.png                | Bin 0 -> 184821 bytes
 docs/spark_custom_resources.md               | 227 +++++++++++++++++++++++++++
 9 files changed, 645 insertions(+)

diff --git a/build-tools/docs-utils/build.gradle 
b/build-tools/docs-utils/build.gradle
index 2cdde29..5ce8e64 100644
--- a/build-tools/docs-utils/build.gradle
+++ b/build-tools/docs-utils/build.gradle
@@ -38,3 +38,5 @@ tasks.register('generateConfPropsDoc', Exec) {
     description = "Generate config properties doc for operator"
     commandLine "java", "-classpath", 
sourceSets.main.runtimeClasspath.getAsPath(), javaMainClass, docsPath
 }
+
+build.finalizedBy(generateConfPropsDoc)
diff --git a/docs/architecture.md b/docs/architecture.md
new file mode 100644
index 0000000..0539355
--- /dev/null
+++ b/docs/architecture.md
@@ -0,0 +1,81 @@
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# Design & Architecture
+
+**Spark-Kubernetes-Operator** (Operator) acts as a control plane to manage the 
complete
+deployment lifecycle of Spark applications and clusters. The Operator can be 
installed on Kubernetes
+cluster(s) using Helm. In most production environments it is typically 
deployed in a designated
+namespace and controls Spark workload in one or more managed namespaces.
+Spark Operator enables user to describe Spark application(s) or cluster(s) as 
+[Custom 
Resources](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/).
 
+
+The Operator continuously tracks events related to the Spark custom resources 
in its reconciliation 
+loops:
+
+For SparkApplications:
+
+* User submits a SparkApplication custom resource(CR) using kubectl / API
+* Operator launches driver and observes its status
+* Operator observes driver-spawn resources (e.g. executors) and record status 
till app terminates
+* Operator releases all Spark-app owned resources to cluster
+
+For SparkClusters:
+
+* User submits a SparkCluster custom resource(CR) using kubectl / API
+* Operator launches master and worker(s) based on CR spec and observes their 
status
+* Operator releases all Spark-cluster owned resources to cluster upon failure
+
+The Operator is built with the [Java Operator 
SDK](https://javaoperatorsdk.io/) for
+launching Spark deployments and submitting jobs under the hood. It also uses 
+[fabric8](https://fabric8.io/) client to interact with Kubernetes API Server.
+
+## Application State Transition
+
+[<img 
src="resources/application_state_machine.png">](resources/application_state_machine.png)
+
+* Spark applications are expected to run from submitted to succeeded before 
releasing resources
+* User may configure the app CR to time-out after given threshold of time if 
it cannot reach healthy
+  state after given threshold. The timeout can be configured for different 
lifecycle stages, 
+  when driver starting and when requesting executor pods. To update the 
default threshold,  
+  configure `.spec.applicationTolerations.applicationTimeoutConfig` for the 
application.        
+* K8s resources created for an application would be deleted as the final stage 
of the application 
+  lifecycle by default. This is to ensure resource quota release for completed 
applications.  
+* It is also possible to retain the created k8s resources for debug or audit 
purpose. To do so,   
+  user may set `.spec.applicationTolerations.resourceRetainPolicy` to 
`OnFailure` to retain 
+  resources upon application failure, or set to `Always` to retain resources 
regardless of 
+  application final state.
+    - This controls the behavior of k8s resources created by Operator for the 
application, including
+      driver pod, config map, service, and PVC(if enabled). This does not 
apply to resources created 
+      by driver (for example, executor pods). User may configure SparkConf to
+      include `spark.kubernetes.executor.deleteOnTermination` for executor 
retention. Please refer 
+      [Spark 
docs](https://spark.apache.org/docs/latest/running-on-kubernetes.html) for 
details.
+    - The created k8s resources have `ownerReference` to their related 
`SparkApplication` custom
+      resource, such that they could be garbage collected when the 
`SparkApplication` is deleted.
+    - Please be advised that k8s resources would not be retained if the 
application is configured to
+      restart. This is to avoid resource quota usage increase unexpectedly or 
resource conflicts 
+      among multiple attempts.
+
+## Cluster State Transition
+
+[<img 
src="resources/cluster_state_machine.png">](resources/application_state_machine.png)
+
+* Spark clusters are expected to be always running after submitted.
+* Similar to Spark applications, K8s resources created for a cluster would be 
deleted as the final 
+  stage of the cluster lifecycle by default.
diff --git a/docs/config_properties.md b/docs/config_properties.md
new file mode 100644
index 0000000..c86ada2
--- /dev/null
+++ b/docs/config_properties.md
@@ -0,0 +1,40 @@
+[//]: # (This doc is automatically generated by gradle task, manual updates 
would be overridden.)
+# Spark Operator Config Properties
+ | Key | Type | Default Value | Allow Hot Reloading | Description | 
+ | --- | --- | --- | --- | --- | 
+ | spark.kubernetes.operator.name | String | spark-kubernetes-operator | false 
| Name of the operator. | 
+ | spark.kubernetes.operator.namespace | String | default | false | Namespace 
that operator is deployed within. | 
+ | spark.kubernetes.operator.watchedNamespaces | String | default | true | 
Comma-separated list of namespaces that the operator would be watching for 
Spark resources. If set to '*', operator would watch all namespaces. | 
+ | spark.kubernetes.operator.terminateOnInformerFailureEnabled | Boolean | 
false | false | Enable to indicate informer errors should stop operator 
startup. If disabled, operator startup will ignore recoverable errors, caused 
for example by RBAC issues and will retry periodically. | 
+ | spark.kubernetes.operator.reconciler.terminationTimeoutSeconds | Integer | 
30 | false | Grace period for operator shutdown before reconciliation threads 
are killed. | 
+ | spark.kubernetes.operator.reconciler.parallelism | Integer | 50 | false | 
Thread pool size for Spark Operator reconcilers. Unbounded pool would be used 
if set to non-positive number. | 
+ | spark.kubernetes.operator.reconciler.foregroundRequestTimeoutSeconds | Long 
| 30 | true | Timeout (in seconds) to for requests made to API server. This 
applies only to foreground requests. | 
+ | spark.kubernetes.operator.reconciler.intervalSeconds | Long | 120 | true | 
Interval (in seconds, non-negative) to reconcile Spark applications. Note that 
reconciliation is always expected to be triggered when app spec / status is 
updated. This interval controls the reconcile behavior of operator 
reconciliation even when there's no update on SparkApplication, e.g. to 
determine whether a hanging app needs to be proactively terminated. Thus this 
is recommended to set to above 2 minutes t [...]
+ | spark.kubernetes.operator.reconciler.trimStateTransitionHistoryEnabled | 
Boolean | true | true | When enabled, operator would trim state transition 
history when a new attempt starts, keeping previous attempt summary only. | 
+ | spark.kubernetes.operator.reconciler.appStatusListenerClassNames | String | 
 | false | Comma-separated names of SparkAppStatusListener class 
implementations | 
+ | spark.kubernetes.operator.reconciler.clusterStatusListenerClassNames | 
String |  | false | Comma-separated names of SparkClusterStatusListener class 
implementations | 
+ | spark.kubernetes.operator.dynamicConfig.enabled | Boolean | false | false | 
When enabled, operator would use config map as source of truth for config 
property override. The config map need to be created in 
spark.kubernetes.operator.namespace, and labeled with operator name. | 
+ | spark.kubernetes.operator.dynamicConfig.selector | String | 
app.kubernetes.io/name=spark-kubernetes-operator,app.kubernetes.io/component=operator-dynamic-config-overrides
 | false | The selector str applied to dynamic config map. | 
+ | spark.kubernetes.operator.dynamicConfig.reconcilerParallelism | Integer | 1 
| false | Parallelism for dynamic config reconciler. Unbounded pool would be 
used if set to non-positive number. | 
+ | spark.kubernetes.operator.reconciler.rateLimiter.refreshPeriodSeconds | 
Integer | 15 | false | Operator rate limiter refresh period(in seconds) for 
each resource. | 
+ | spark.kubernetes.operator.reconciler.rateLimiter.maxLoopForPeriod | Integer 
| 5 | false | Max number of reconcile loops triggered within the rate limiter 
refresh period for each resource. Setting the limit <= 0 disables the limiter. 
| 
+ | spark.kubernetes.operator.reconciler.retry.initialIntervalSeconds | Integer 
| 5 | false | Initial interval(in seconds) of retries on unhandled controller 
errors. | 
+ | spark.kubernetes.operator.reconciler.retry.intervalMultiplier | Double | 
1.5 | false | Interval multiplier of retries on unhandled controller errors. 
Setting this to 1 for linear retry. | 
+ | spark.kubernetes.operator.reconciler.retry.maxIntervalSeconds | Integer | 
-1 | false | Max interval(in seconds) of retries on unhandled controller 
errors. Set to non-positive for unlimited. | 
+ | spark.kubernetes.operator.api.retryMaxAttempts | Integer | 15 | false | Max 
attempts of retries on unhandled controller errors. Setting this to 
non-positive value means no retry. | 
+ | spark.kubernetes.operator.api.retryAttemptAfterSeconds | Long | 1 | false | 
Default time (in seconds) to wait till next request. This would be used if 
server does not set Retry-After in response. Setting this to non-positive 
number means immediate retry. | 
+ | spark.kubernetes.operator.api.statusPatchMaxAttempts | Long | 3 | false | 
Maximal number of retry attempts of requests to k8s server for resource status 
update. This would be performed on top of k8s client 
spark.kubernetes.operator.retry.maxAttempts to overcome potential conflicting 
update on the same SparkApplication. This should be positive number. | 
+ | spark.kubernetes.operator.api.secondaryResourceCreateMaxAttempts | Long | 3 
| false | Maximal number of retry attempts of requesting secondary resource for 
Spark application. This would be performed on top of k8s client 
spark.kubernetes.operator.retry.maxAttempts to overcome potential conflicting 
reconcile on the same SparkApplication. This should be positive number | 
+ | spark.kubernetes.operator.metrics.josdkMetricsEnabled | Boolean | true | 
false | When enabled, the josdk metrics will be added in metrics source and 
configured for operator. | 
+ | spark.kubernetes.operator.metrics.clientMetricsEnabled | Boolean | true | 
false | Enable KubernetesClient metrics for measuring the HTTP traffic to the 
Kubernetes API Server. Since the metrics is collected via Okhttp interceptors, 
can be disabled when opt in customized interceptors. | 
+ | spark.kubernetes.operator.metrics.clientMetricsGroupByResponseCodeEnabled | 
Boolean | true | false | When enabled, additional metrics group by http 
response code group(1xx, 2xx, 3xx, 4xx, 5xx) received from API server will be 
added. Users can disable it when their monitoring system can combine lower 
level kubernetes.client.http.response.<3-digit-response-code> metrics. | 
+ | spark.kubernetes.operator.metrics.port | Integer | 19090 | false | The port 
used for checking metrics | 
+ | spark.kubernetes.operator.health.probePort | Integer | 19091 | false | The 
port used for health/readiness check probe status. | 
+ | spark.kubernetes.operator.health.sentinelExecutorPoolSize | Integer | 3 | 
false | Size of executor service in Sentinel Managers to check the health of 
sentinel resources. | 
+ | spark.kubernetes.operator.health.sentinelResourceReconciliationDelaySeconds 
| Integer | 60 | true | Allowed max time(seconds) between spec update and 
reconciliation for sentinel resources. | 
+ | spark.kubernetes.operator.leaderElection.enabled | Boolean | false | false 
| Enable leader election for the operator to allow running standby instances. 
When this is disabled, only one operator instance is expected to be up and 
running at any time (replica = 1) to avoid race condition. | 
+ | spark.kubernetes.operator.leaderElection.leaseName | String | 
spark-operator-lease | false | Leader election lease name, must be unique for 
leases in the same namespace. | 
+ | spark.kubernetes.operator.leaderElection.leaseDurationSeconds | Integer | 
180 | false | Leader election lease duration in seconds, non-negative. | 
+ | spark.kubernetes.operator.leaderElection.renewDeadlineSeconds | Integer | 
120 | false | Leader election renew deadline in seconds, non-negative. This 
needs to be smaller than the lease duration to allow current leader renew the 
lease before lease expires. | 
+ | spark.kubernetes.operator.leaderElection.retryPeriodSeconds | Integer | 5 | 
false | Leader election retry period in seconds, non-negative. | 
+
diff --git a/docs/configuration.md b/docs/configuration.md
new file mode 100644
index 0000000..bafd3c5
--- /dev/null
+++ b/docs/configuration.md
@@ -0,0 +1,144 @@
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# Configuration
+
+## Configure Operator
+
+Spark Operator supports different ways to configure the behavior:
+
+* **spark-operator.properties** provided when deploying the operator. In 
addition to the
+  [property 
file](../build-tools/helm/spark-kubernetes-operator/conf/spark-operator.
+  properties), it is also possible to override or append config properties in 
helm [Values
+  files](../build-tools/helm/spark-kubernetes-operator/values.yaml).
+* **System Properties** : when provided as system properties (e.g. via -D 
options to the
+  operator JVM), it overrides the values provided in property file.
+* **Hot property loading** : when enabled, a 
+  [configmap](https://kubernetes.io/docs/concepts/configuration/configmap/) 
would be created with 
+  the operator in the same namespace. Operator can monitor updates performed 
on the configmap. Hot 
+  properties reloading takes higher precedence comparing with default 
properties override.
+    - An example use case: operator use hot properties to figure the list of 
namespace(s) to
+      operate Spark applications. The hot properties config map can be updated 
and
+      maintained by user or additional microservice to tune the operator 
behavior without
+      rebooting it.
+    - Please be advised that not all properties can be hot-loaded and honored 
at runtime.
+      Refer the list of [supported properties](./config_properties.md) for 
more details.
+
+To enable hot properties loading, update the **helm chart values file** with
+
+```
+operatorConfiguration:
+  spark-operator.properties: |+
+    spark.operator.dynamic.config.enabled=true
+    # ... all other config overides...
+  dynamicConfig:
+    create: true
+```
+
+## Metrics
+
+Spark operator,
+following [Apache 
Spark](https://spark.apache.org/docs/latest/monitoring.html#metrics),
+has a configurable metrics system based on
+the [Dropwizard Metrics Library](https://metrics.dropwizard.io/4.2.25/). Note 
that Spark Operator
+does not have Spark UI, MetricsServlet
+and PrometheusServlet from org.apache.spark.metrics.sink package are not 
supported. If you are
+interested in Prometheus metrics exporting, please take a look at below
+section [Forward Metrics to Prometheus](#Forward-Metrics-to-Prometheus)
+
+### JVM Metrics
+
+Spark Operator collects JVM metrics
+via [Codahale JVM 
Metrics](https://javadoc.io/doc/com.codahale.metrics/metrics-jvm/latest/index.html)
+
+- BufferPoolMetricSet
+- FileDescriptorRatioGauge
+- GarbageCollectorMetricSet
+- MemoryUsageGaugeSet
+- ThreadStatesGaugeSet
+
+### Kubernetes Client Metrics
+
+| Metrics Name                                              | Type       | 
Description                                                                     
                                         |
+|-----------------------------------------------------------|------------|--------------------------------------------------------------------------------------------------------------------------|
+| kubernetes.client.http.request                            | Meter      | 
Tracking the rates of HTTP request sent to the Kubernetes API Server            
                                         |
+| kubernetes.client.http.response                           | Meter      | 
Tracking the rates of HTTP response from the Kubernetes API Server              
                                         |
+| kubernetes.client.http.response.failed                    | Meter      | 
Tracking the rates of HTTP requests which have no response from the Kubernetes 
API Server                                |
+| kubernetes.client.http.response.latency.nanos             | Histograms | 
Measures the statistical distribution of HTTP response latency from the 
Kubernetes API Server                            |
+| kubernetes.client.http.response.<ResponseCode>            | Meter      | 
Tracking the rates of HTTP response based on response code from the Kubernetes 
API Server                                |
+| kubernetes.client.http.request.<RequestMethod>            | Meter      | 
Tracking the rates of HTTP request based type of method to the Kubernetes API 
Server                                     |
+| kubernetes.client.http.response.1xx                       | Meter      | 
Tracking the rates of HTTP Code 1xx responses (informational) received from the 
Kubernetes API Server per response code. |
+| kubernetes.client.http.response.2xx                       | Meter      | 
Tracking the rates of HTTP Code 2xx responses (success) received from the 
Kubernetes API Server per response code.       |
+| kubernetes.client.http.response.3xx                       | Meter      | 
Tracking the rates of HTTP Code 3xx responses (redirection) received from the 
Kubernetes API Server per response code.   |
+| kubernetes.client.http.response.4xx                       | Meter      | 
Tracking the rates of HTTP Code 4xx responses (client error) received from the 
Kubernetes API Server per response code.  |
+| kubernetes.client.http.response.5xx                       | Meter      | 
Tracking the rates of HTTP Code 5xx responses (server error) received from the 
Kubernetes API Server per response code.  |
+| kubernetes.client.<ResourceName>.<Method>                 | Meter      | 
Tracking the rates of HTTP request for a combination of one Kubernetes resource 
and one http method                      |
+| kubernetes.client.<NamespaceName>.<ResourceName>.<Method> | Meter      | 
Tracking the rates of HTTP request for a combination of one namespace-scoped 
Kubernetes resource and one http method     |
+
+### Forward Metrics to Prometheus
+
+In this section, we will show you how to forward Spark Operator metrics
+to [Prometheus](https://prometheus.io).
+
+* Modify the metrics properties section in the file
+  `build-tools/helm/spark-kubernetes-operator/values.yaml`:
+
+```properties
+metrics.properties:|+
+  
spark.metrics.conf.operator.sink.prometheus.class=org.apache.spark.kubernetes.operator.metrics.
+sink.PrometheusPullModelSink
+```
+
+* Install Spark Operator
+
+```bash
+helm install spark-kubernetes-operator -f 
build-tools/helm/spark-kubernetes-operator/values.yaml 
build-tools/helm/spark-kubernetes-operator/
+```
+
+* Install Prometheus via Helm Chart
+
+```bash
+helm repo add prometheus-community 
https://prometheus-community.github.io/helm-charts
+helm install prometheus prometheus-community/prometheus
+```
+
+* Find and Annotate Spark Operator Pods
+
+```bash
+kubectl get pods -l app.kubernetes.io/name=spark-kubernetes-operator
+NAME                                         READY   STATUS    RESTARTS   AGE
+spark-kubernetes-operator-598cb5d569-bvvd2   1/1     Running   0          24m
+
+kubectl annotate pods spark-kubernetes-operator-598cb5d569-bvvd2 
prometheus.io/scrape=true
+kubectl annotate pods spark-kubernetes-operator-598cb5d569-bvvd2 
prometheus.io/path=/prometheus
+kubectl annotate pods spark-kubernetes-operator-598cb5d569-bvvd2 
prometheus.io/port=19090
+```
+
+* Check Metrics via Prometheus UI
+
+```bash
+kubectl get pods | grep "prometheus-server"
+prometheus-server-654bc74fc9-8hgkb                   2/2     Running   0       
   59m
+
+kubectl port-forward --address 0.0.0.0 pod/prometheus-server-654bc74fc9-8hgkb  
8080:9090
+```
+
+open your browser with address `localhost:8080`. Click on Status Targets tab, 
you should be able
+to find target as below.
+[<img src="resources/prometheus.png">](resources/prometheus.png)
diff --git a/docs/operations.md b/docs/operations.md
new file mode 100644
index 0000000..84fac03
--- /dev/null
+++ b/docs/operations.md
@@ -0,0 +1,151 @@
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+### Compatibility
+
+- Java 17 and 21
+- Kubernetes version compatibility:
+    + k8s version >= 1.28 is recommended. Operator attempts to be API 
compatible as possible, but 
+      patch support will not be performed on k8s versions that reached EOL.
+- Spark versions 3.5 or above.
+
+### Spark Application Namespaces
+
+By default, Spark applications are created in the same namespace as the 
operator deployment.
+You many also configure the chart deployment to add necessary RBAC resources 
for
+applications to enable them running in additional namespaces.
+
+## Overriding configuration parameters during Helm install
+
+Helm provides different ways to override the default installation parameters 
(contained
+in `values.yaml`) for the Helm chart.
+
+To override single parameters you can use `--set`, for example:
+
+```
+helm install --set image.repository=<my_registory>/spark-kubernetes-operator \
+   -f build-tools/helm/spark-kubernetes-operator/values.yaml \
+  build-tools/helm/spark-kubernetes-operator/
+```
+
+You can also provide multiple custom values file by using the `-f` flag, the 
latest takes
+higher precedence:
+
+```
+helm install spark-kubernetes-operator \
+   -f build-tools/helm/spark-kubernetes-operator/values.yaml \
+   -f my_values.yaml \
+   build-tools/helm/spark-kubernetes-operator/
+```
+
+The configurable parameters of the Helm chart and which default values as 
detailed in the
+following table:
+
+| Parameters                                                    | Description  
                                                                                
                                                                                
  | Default value                                                               
                            |
+|---------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------|
+| image.repository                                              | The image 
repository of spark-kubernetes-operator.                                        
                                                                                
     | spark-kubernetes-operator                                                
                               |
+| image.pullPolicy                                              | The image 
pull policy of spark-kubernetes-operator.                                       
                                                                                
     | IfNotPresent                                                             
                               |
+| image.tag                                                     | The image 
tag of spark-kubernetes-operator.                                               
                                                                                
     | 0.1.0-SNAPSHOT                                                           
                               |
+| image.digest                                                  | The image 
digest of spark-kubernetes-operator. If set then it takes precedence and the 
image tag will be ignored.                                                      
        |                                                                       
                                  |
+| imagePullSecrets                                              | The image 
pull secrets of spark-kubernetes-operator.                                      
                                                                                
     |                                                                          
                               |
+| operatorDeployment.replica                                    | Operator 
replica count. Must be 1 unless leader election is configured.                  
                                                                                
      | 1                                                                       
                                |
+| operatorDeployment.strategy.type                              | Operator pod 
upgrade strategy. Must be Recreate unless leader election is configured.        
                                                                                
  | Recreate                                                                    
                            |
+| operatorDeployment.operatorPod.annotations                    | Custom 
annotations to be added to the operator pod                                     
                                                                                
        |                                                                       
                                  |
+| operatorDeployment.operatorPod.labels                         | Custom 
labels to be added to the operator pod                                          
                                                                                
        |                                                                       
                                  |
+| operatorDeployment.operatorPod.nodeSelector                   | Custom 
nodeSelector to be added to the operator pod.                                   
                                                                                
        |                                                                       
                                  |
+| operatorDeployment.operatorPod.topologySpreadConstraints      | Custom 
topologySpreadConstraints to be added to the operator pod.                      
                                                                                
        |                                                                       
                                  |
+| operatorDeployment.operatorPod.dnsConfig                      | DNS 
configuration to be used by the operator pod.                                   
                                                                                
           |                                                                    
                                     |
+| operatorDeployment.operatorPod.volumes                        | Additional 
volumes to be added to the operator pod.                                        
                                                                                
    |                                                                           
                              |
+| operatorDeployment.operatorPod.priorityClassName              | Priority 
class name to be used for the operator pod                                      
                                                                                
      |                                                                         
                                |
+| operatorDeployment.operatorPod.securityContext                | Security 
context overrides for the operator pod                                          
                                                                                
      |                                                                         
                                |
+| operatorDeployment.operatorContainer.jvmArgs                  | JVM arg 
override for the operator container.                                            
                                                                                
       | `"-Dfile.encoding=UTF8"`                                               
                                 |
+| operatorDeployment.operatorContainer.env                      | Custom env 
to be added to the operator container.                                          
                                                                                
    |                                                                           
                              |
+| operatorDeployment.operatorContainer.envFrom                  | Custom 
envFrom to be added to the operator container, e.g. for downward API.           
                                                                                
        |                                                                       
                                  |
+| operatorDeployment.operatorContainer.probes                   | Probe config 
for the operator container.                                                     
                                                                                
  |                                                                             
                            |
+| operatorDeployment.operatorContainer.securityContext          | Security 
context overrides for the operator container.                                   
                                                                                
      | run as non root for baseline secuirty standard compliance               
                                |
+| operatorDeployment.operatorContainer.resources                | Resources 
for the operator container.                                                     
                                                                                
     | memory 4Gi, ephemeral storage 2Gi and 1 cpu                              
                               |
+| operatorDeployment.additionalContainers                       | Additional 
containers to be added to the operator pod, e.g. sidecar.                       
                                                                                
    |                                                                           
                              |
+| operatorRbac.serviceAccount.create                            | Whether to 
create service account for operator to use.                                     
                                                                                
    | true                                                                      
                              |
+| operatorRbac.serviceAccount.name                              | Name of the 
operator Role.                                                                  
                                                                                
   | `"spark-operator"`                                                         
                             |
+| operatorRbac.clusterRole.create                               | Whether to 
create ClusterRole for operator to use.                                         
                                                                                
    | true                                                                      
                              |
+| operatorRbac.clusterRole.name                                 | Name of the 
operator ClusterRole.                                                           
                                                                                
   | `"spark-operator-clusterrole"`                                             
                             |
+| operatorRbac.clusterRoleBinding.create                        | Whether to 
create ClusterRoleBinding for operator to use.                                  
                                                                                
    | true                                                                      
                              |
+| operatorRbac.clusterRoleBinding.name                          | Name of the 
operator ClusterRoleBinding.                                                    
                                                                                
   | `"spark-operator-clusterrolebinding"`                                      
                             |
+| operatorRbac.role.create                                      | Whether to 
create Role for operator to use in each workload namespace(s). At least one of 
`clusterRole.create` or `role.create` should be enabled                         
     | false                                                                    
                               |
+| operatorRbac.role.name                                        | Name of the 
operator Role                                                                   
                                                                                
   | `"spark-operator-role"`                                                    
                             |
+| operatorRbac.roleBinding.create                               | Whether to 
create RoleBinding for operator to use. At least one of 
`clusterRoleBinding.create` or `roleBinding.create` should be enabled           
                            | false                                             
                                                      |
+| operatorRbac.roleBinding.name                                 | Name of the 
operator RoleBinding in each workload namespace(s).                             
                                                                                
   | `"spark-operator-rolebinding"`                                             
                             |
+| operatorRbac.roleBinding.roleRef                              | RoleRef for 
the created Operator RoleBinding. Override this when you want the created 
RoleBinding refer to ClusterRole / Role that's different from the default 
operator Role. | Refers to default `operatorRbac.role.name`                     
                                         |
+| operatorRbac.configManagement.create                          | Enable this 
to create a Role for operator configuration management (hot property loading 
and leader election).                                                           
      | true                                                                    
                                |
+| operatorRbac.configManagement.roleName                        | Role name 
for operator configuration management.                                          
                                                                                
     | `spark-operator-config-role`                                             
                               |
+| operatorRbac.configManagement.roleBinding                     | RoleBinding 
name for operator configuration management.                                     
                                                                                
   | `"spark-operator-config-monitor-role-binding"`                             
                             |
+| operatorRbac.labels                                           | Labels to be 
applied on all created `operatorRbac` resources.                                
                                                                                
  | `"app.kubernetes.io/component": "operator-rbac"`                            
                            |
+| workloadResources.namespaces.create                           | Whether to 
create dedicated namespaces for Spark workload.                                 
                                                                                
    | true                                                                      
                              |
+| workloadResources.namespaces.overrideWatchedNamespaces        | When 
enabled, operator would by default only watch namespace(s) provided in data 
field.                                                                          
              | true                                                            
                                        |
+| workloadResources.namespaces.data                             | List of 
namespaces to create for Spark workload. The chart namespace would be used if 
this is empty.                                                                  
         |                                                                      
                                   |
+| workloadResources.clusterRole.create                          | When 
enabled, a ClusterRole would be created for Spark workload to use.              
                                                                                
          | true                                                                
                                    |
+| workloadResources.clusterRole.name                            | Name of the 
Spark workload ClusterRole.                                                     
                                                                                
   | "spark-workload-clusterrole"                                               
                             |
+| workloadResources.role.create                                 | When 
enabled, a Role would be created in each namespace for Spark workload. At least 
one of `clusterRole.create` or `role.create` should be enabled.                 
          | false                                                               
                                    |
+| workloadResources.role.name                                   | Name for 
Spark workload Role.                                                            
                                                                                
      | "spark-workload-role"                                                   
                                |
+| workloadResources.roleBinding.create                          | When 
enabled, a RoleBinding would be created in each namespace for Spark workload. 
This shall be enabled unless access is configured from 3rd party.               
            | true                                                              
                                      |
+| workloadResources.roleBinding.name                            | Name of the 
Spark workload RoleBinding.                                                     
                                                                                
   | "spark-workload-rolebinding"                                               
                             |
+| workloadResources.serviceAccounts.create                      | Whether to 
create a service account for Spark workload.                                    
                                                                                
    | true                                                                      
                              |
+| workloadResources.serviceAccounts.name                        | The name of 
Spark workload service account.                                                 
                                                                                
   | `spark`                                                                    
                             |
+| workloadResources.labels                                      | Labels to be 
applied for all workload resources.                                             
                                                                                
  | `"app.kubernetes.io/component": "spark-workload"`                           
                            |
+| workloadResources.annotations                                 | Annotations 
to be applied for all workload resources.                                       
                                                                                
   | `"helm.sh/resource-policy": keep`                                          
                             |
+| workloadResources.sparkApplicationSentinel.create             | If enabled, 
sentinel resources will be created for operator to watch and reconcile for the 
health probe purpose.                                                           
    | false                                                                     
                              |
+| workloadResources.sparkApplicationSentinel.sentinelNamespaces | A list of 
namespaces where sentinel resources will be created in. Note that these 
namespaces have to be a subset of `workloadResources.namespaces.data`.          
             |                                                                  
                                       |
+| operatorConfiguration.append                                  | If set to 
true, below conf file & properties would be appended to default conf. 
Otherwise, they would override default properties.                              
               | true                                                           
                                         |
+| operatorConfiguration.log4j2.properties                       | The default 
log4j2 configuration.                                                           
                                                                                
   | Refer default 
[log4j2.properties](../build-tools/helm/spark-kubernetes-operator/conf/log4j2.properties)
 |
+| operatorConfiguration.spark-operator.properties               | The default 
operator configuration.                                                         
                                                                                
   |                                                                            
                             |
+| operatorConfiguration.metrics.properties                      | The default 
operator metrics (sink) configuration.                                          
                                                                                
   |                                                                            
                             |
+| operatorConfiguration.dynamicConfig.create                    | If set to 
true, a config map would be created & watched by operator as source of truth 
for hot properties loading.                                                     
        | false                                                                 
                                  |
+| operatorConfiguration.dynamicConfig.enable                    | If set to 
true, operator would honor the created config mapas source of truth for hot 
properties loading.                                                             
         | false                                                                
                                   |
+| operatorConfiguration.dynamicConfig.annotations               | Annotations 
to be applied for the dynamicConfig resources.                                  
                                                                                
   | `"helm.sh/resource-policy": keep`                                          
                             |
+| operatorConfiguration.dynamicConfig.data                      | Data field 
(key-value pairs) that acts as hot properties in the config map.                
                                                                                
    | `spark.kubernetes.operator.reconciler.intervalSeconds: "60"`              
                              |
+
+For more information check the [Helm 
documentation](https://helm.sh/docs/helm/helm_install/).
+
+__Notice__: The pod resources should be set as your workload in different 
environments to
+archive a matched K8s pod QoS. See
+also [Pod Quality of Service 
Classes](https://kubernetes.io/docs/concepts/workloads/pods/pod-qos/#quality-of-service-classes).
+
+## Operator Health(Liveness) Probe with Sentinel Resource
+
+Learning
+from [Apache Flink 
Operator](https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-main/docs/operations/health/#canary-resources),
+a dummy spark application resource in any watched namespace can help Spark 
operator health
+probe monitor.
+
+Here is a Spark Sentinel resource example with the label 
`"spark.operator/sentinel": "true"`
+and it will not result in creation of any other kubernetes resources. 
Controlled by
+property `health.sentinel.resource.reconciliation.delay.seconds`, by default, 
the timeout to
+reconcile the sentinel resources is 60 seconds. If the operator cannot 
reconcile these
+resources within limited time, the operator health probe will return HTTP code 
500 when kubelet
+send the HTTP Get to the liveness endpoint, and the
+kubelet will then kill the spark operator container and restart it.
+
+```yaml
+apiVersion: org.apache.spark/v1alpha1
+kind: SparkApplication
+metadata:
+  name: spark-sentinel-resources
+  labels:
+    "spark.operator/sentinel": "true"
+```
diff --git a/docs/resources/application_state_machine.png 
b/docs/resources/application_state_machine.png
new file mode 100644
index 0000000..3b3df3d
Binary files /dev/null and b/docs/resources/application_state_machine.png differ
diff --git a/docs/resources/cluster_state_machine.png 
b/docs/resources/cluster_state_machine.png
new file mode 100644
index 0000000..2a8dcdd
Binary files /dev/null and b/docs/resources/cluster_state_machine.png differ
diff --git a/docs/resources/prometheus.png b/docs/resources/prometheus.png
new file mode 100644
index 0000000..5507d57
Binary files /dev/null and b/docs/resources/prometheus.png differ
diff --git a/docs/spark_custom_resources.md b/docs/spark_custom_resources.md
new file mode 100644
index 0000000..aef3377
--- /dev/null
+++ b/docs/spark_custom_resources.md
@@ -0,0 +1,227 @@
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+## Spark Operator API
+
+The core user facing API of the Spark Kubernetes Operator is the 
`SparkApplication` and 
+`SparkCluster` Custom Resources Definition (CRD). Spark custom resource 
extends 
+standard k8s API, defines Spark Application spec and tracks status.
+
+Once the Spark Operator is installed and running in your Kubernetes 
environment, it will
+continuously watch SparkApplication(s) and SparkCluster(s) submitted, via k8s 
API client or 
+kubectl by the user, orchestrate secondary resources (pods, configmaps .etc).
+
+Please check out the [quickstart](../README.md) as well for installing 
operator.
+
+## SparkApplication
+
+SparkApplication can be defined in YAML format. User may configure the 
application entrypoint 
+and configurations. Let's start with the [Spark-Pi 
example](../examples/pi.yaml):
+
+```yaml
+apiVersion: spark.apache.org/v1alpha1
+kind: SparkApplication
+metadata:
+  name: pi
+spec:
+  # Entry point for the app  
+  mainClass: "org.apache.spark.examples.SparkPi"
+  jars: "local:///opt/spark/examples/jars/spark-examples.jar"
+  sparkConf:
+    spark.dynamicAllocation.enabled: "true"
+    spark.dynamicAllocation.shuffleTracking.enabled: "true"
+    spark.dynamicAllocation.maxExecutors: "3"
+    spark.log.structuredLogging.enabled: "false"
+    spark.kubernetes.authenticate.driver.serviceAccountName: "spark"
+    spark.kubernetes.container.image: "apache/spark:4.0.0-preview2"
+  applicationTolerations:
+    resourceRetainPolicy: OnFailure
+  runtimeVersions:
+    scalaVersion: "2.13"
+    sparkVersion: "4.0.0-preview2"
+```
+
+After application is submitted, Operator will add status information to your 
application based on
+the observed state:
+
+```
+kubectl get sparkapp pi -o yaml
+```
+
+### Write and build your SparkApplication
+
+It's straightforward to convert your spark-submit application to 
`SparkApplication` yaml.
+Operators constructs driver spec in the similar approach. To submit Java / 
scala application,
+use `.spec.jars` and `.spec.mainClass`. Similarly, set `pyFiles` for Python 
applications.
+
+While building images to use by driver and executor, it's recommended to use 
official
+[Spark Docker](https://github.com/apache/spark-docker) as base images. Check 
the pod template
+support (`.spec.driverSpec.podTemplateSpec` and 
`.spec.executorSpec.podTemplateSpec`) as well for
+setting custom Spark home and work dir.
+
+### Pod Template Support
+
+It is possible to configure pod template for driver & executor pods for 
configure spec that are
+not configurable from SparkConf.
+
+Spark Operator supports defining pod template for driver and executor pods in 
two ways:
+
+1. Set `PodTemplateSpec` in `SparkApplication`
+2. Config `spark.kubernetes.[driver/executor].podTemplateFile`
+
+If pod template spec is set in application spec (option 1), it would take 
higher precedence
+than option 2. Also `spark.kubernetes.[driver/executor].podTemplateFile` would 
be unset to
+avoid multiple override.
+
+When pod template is set as remote file in conf properties (option 2), please 
ensure Spark
+Operator has necessary permission to access the remote file location, e.g. 
deploy operator
+with proper workload identity with target S3 / Cloud Storage bucket access. 
Similar permission
+requirements are also needed driver pod: operator needs template file access 
to create driver,
+and driver needs the same for creating executors.
+
+Please be advised that Spark still overrides necessary pod configuration in 
both options. For
+more details,
+refer [Spark 
doc](https://spark.apache.org/docs/latest/running-on-kubernetes.html#pod-template).
+
+## Understanding Failure Types
+
+In addition to the general `Failed` state (that driver pod fails or driver 
container exits
+with non-zero code), Spark Operator introduces a few different failure state 
for ease of
+app status monitoring at high level, and for ease of setting up different 
handlers if users
+are creating / managing SparkApplications with external microservices or 
workflow engines.
+
+
+Spark Operator recognizes "infrastructure failure" in the best effort way. It 
is possible to
+configure different restart policy on general failure(s) vs. on potential 
infrastructure
+failure(s). For example, you may configure the app to restart only upon 
infrastructure
+failures. If Spark application fails as a result of
+
+```
+DriverStartTimedOut
+ExecutorsStartTimedOut
+SchedulingFailure
+```
+
+It is more likely that the app failed as a result of infrastructure reason(s), 
including
+scenarios like driver or executors cannot be scheduled or cannot initialize in 
configured
+time window for scheduler reasons, as a result of insufficient capacity, 
cannot get IP
+allocated, cannot pull images, or k8s API server issue at scheduling .etc.
+
+Please be advised that this is a best-effort failure identification. You may 
still need to
+debug actual failure from the driver pods. Spark Operator would stage the last 
observed
+driver pod status with the stopping state for audit purposes.
+
+## Configure the Tolerations for SparkApplication
+
+### Restart
+
+Spark Operator enables configure app restart behavior for different failure 
types. Here's a
+sample restart config snippet:
+
+``` yaml
+restartConfig:
+  # accptable values are 'Never', 'Always', 'OnFailure' and 
'OnInfrastructureFailure'
+  restartPolicy: Never
+  # operator would retry the application if configured. All resources from 
current attepmt
+  # would be deleted before starting next attempt
+  maxRestartAttempts: 3
+  # backoff time (in millis) that operator would wait before next attempt
+  restartBackoffMillis: 30000
+```
+
+### Timeouts
+
+It's possible to configure applications to be proactively terminated and 
resubmitted in particular 
+cases to avoid resource deadlock. 
+
+
+| Field                                                                        
           | Type    | Default Value | Descritpion                              
                                                                          |
+|-----------------------------------------------------------------------------------------|---------|---------------|--------------------------------------------------------------------------------------------------------------------|
+| 
.spec.applicationTolerations.applicationTimeoutConfig.driverStartTimeoutMillis  
        | integer | 300000        | Time to wait for driver reaches running 
state after requested driver.                                              |
+| 
.spec.applicationTolerations.applicationTimeoutConfig.executorStartTimeoutMillis
        | integer | 300000        | Time to wait for driver to acquire minimal 
number of running executors.                                            |
+| 
.spec.applicationTolerations.applicationTimeoutConfig.forceTerminationGracePeriodMillis
 | integer | 300000        | Time to wait for force delete resources at the end 
of attempt.                                                     |
+| 
.spec.applicationTolerations.applicationTimeoutConfig.driverReadyTimeoutMillis  
        | integer | 300000        | Time to wait for driver reaches ready 
state.                                                                       |
+| 
.spec.applicationTolerations.applicationTimeoutConfig.terminationRequeuePeriodMillis
    | integer | 2000          | Back-off time when releasing resource need to 
be re-attempted for application.                                     |
+
+
+### Instance Config
+
+Instance Config helps operator to decide whether an application is running 
healthy. When
+the underlying cluster has batch scheduler enabled, you may configure the apps 
to be
+started if and only if there are sufficient resources. If, however, the 
cluster does not
+have a batch scheduler, operator may help avoid app hanging with 
`InstanceConfig` that
+describes the bare minimal tolerable scenario.
+
+For example, with below spec:
+
+```yaml
+applicationTolerations:
+  instanceConfig:
+    minExecutors: 3
+    initExecutors: 5
+    maxExecutors: 10
+sparkConf:
+  spark.executor.instances: "10"
+```
+
+Spark would try to bring up 10 executors as defined in SparkConf. In addition, 
from
+operator perspective,
+
+* If Spark app acquires less than 5 executors in given tine window (.spec.
+  applicationTolerations.applicationTimeoutConfig.executorStartTimeoutMillis) 
after
+  submitted, it would be shut down proactively in order to avoid resource 
deadlock.
+* Spark app would be marked as 'RunningWithBelowThresholdExecutors' if it 
loses executors after
+  successfully start up.
+* Spark app would be marked as 'RunningHealthy' if it has at least min 
executors after
+  successfully started up.
+
+### Delete Resources On Termination
+
+Operator by default would delete all created resources at the end of an 
attempt. It would
+try to record the last observed driver status in `status` field of the 
application for
+troubleshooting purpose.
+
+On the other hand, when developing an application, it's possible to configure
+
+```yaml
+applicationTolerations:
+  # Acceptable values are 'Always', 'OnFailure', 'Never'
+  resourceRetentionPolicy: OnFailure
+```
+
+to avoid operator attempt to delete driver pod and driver resources if app 
fails. Similarly,
+if resourceRetentionPolicy is set to `Always`, operator would not delete 
driver resources
+when app ends. Note that this applies only to operator-created resources 
(driver pod, SparkConf
+configmap .etc). You may also want to tune 
`spark.kubernetes.driver.service.deleteOnTermination`
+and `spark.kubernetes.executor.deleteOnTermination` to control the behavior of 
driver-created
+resources.
+
+## Spark Cluster
+
+Spark Operator also supports launching Spark clusters in k8s via 
`SparkCluster` custom resource,
+which takes minimal effort to specify desired master and worker instances spec.
+
+To deploy a Spark cluster, you may start with specifying the desired Spark 
version, worker count as
+well as the SparkConf as in the 
[example](../examples/qa-cluster-with-one-worker.yaml). Master &
+worker instances would be deployed as 
[StatefulSets](https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/)
+and exposed via k8s 
[service(s)](https://kubernetes.io/docs/concepts/services-networking/service/).
+
+Like Pod Template Support for Applications, it's also possible to submit 
template(s) for the Spark
+instances for `SparkCluster` to configure spec that's not supported via 
SparkConf. It's worth notice 
+that Spark may overwrite certain fields.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

(spark-kubernetes-operator) branch main updated: [SPARK-49464] Add documentations

Reply via email to