This is an automated email from the ASF dual-hosted git repository.
dongjoon pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/spark-kubernetes-operator.git
The following commit(s) were added to refs/heads/main by this push:
new de525e2 [SPARK-49464] Add documentations
de525e2 is described below
commit de525e2cd96f32d07f0a9541a28f6919e34360dc
Author: zhou-jiang <[email protected]>
AuthorDate: Fri Oct 4 16:34:29 2024 -0700
[SPARK-49464] Add documentations
### What changes were proposed in this pull request?
This PR includes Operator docs under `docs/` for configuration,
architecture, operations, and metrics.
### Why are the changes needed?
Operator docs are necessary for users to understand the design and getting
started with the operator installation
### Does this PR introduce _any_ user-facing change?
No - new release
### How was this patch tested?
CIs
### Was this patch authored or co-authored using generative AI tooling?
No
Closes #113 from jiangzho/doc.
Authored-by: zhou-jiang <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
---
build-tools/docs-utils/build.gradle | 2 +
docs/architecture.md | 81 ++++++++++
docs/config_properties.md | 40 +++++
docs/configuration.md | 144 +++++++++++++++++
docs/operations.md | 151 ++++++++++++++++++
docs/resources/application_state_machine.png | Bin 0 -> 82299 bytes
docs/resources/cluster_state_machine.png | Bin 0 -> 15835 bytes
docs/resources/prometheus.png | Bin 0 -> 184821 bytes
docs/spark_custom_resources.md | 227 +++++++++++++++++++++++++++
9 files changed, 645 insertions(+)
diff --git a/build-tools/docs-utils/build.gradle
b/build-tools/docs-utils/build.gradle
index 2cdde29..5ce8e64 100644
--- a/build-tools/docs-utils/build.gradle
+++ b/build-tools/docs-utils/build.gradle
@@ -38,3 +38,5 @@ tasks.register('generateConfPropsDoc', Exec) {
description = "Generate config properties doc for operator"
commandLine "java", "-classpath",
sourceSets.main.runtimeClasspath.getAsPath(), javaMainClass, docsPath
}
+
+build.finalizedBy(generateConfPropsDoc)
diff --git a/docs/architecture.md b/docs/architecture.md
new file mode 100644
index 0000000..0539355
--- /dev/null
+++ b/docs/architecture.md
@@ -0,0 +1,81 @@
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements. See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership. The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied. See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# Design & Architecture
+
+**Spark-Kubernetes-Operator** (Operator) acts as a control plane to manage the
complete
+deployment lifecycle of Spark applications and clusters. The Operator can be
installed on Kubernetes
+cluster(s) using Helm. In most production environments it is typically
deployed in a designated
+namespace and controls Spark workload in one or more managed namespaces.
+Spark Operator enables user to describe Spark application(s) or cluster(s) as
+[Custom
Resources](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/).
+
+The Operator continuously tracks events related to the Spark custom resources
in its reconciliation
+loops:
+
+For SparkApplications:
+
+* User submits a SparkApplication custom resource(CR) using kubectl / API
+* Operator launches driver and observes its status
+* Operator observes driver-spawn resources (e.g. executors) and record status
till app terminates
+* Operator releases all Spark-app owned resources to cluster
+
+For SparkClusters:
+
+* User submits a SparkCluster custom resource(CR) using kubectl / API
+* Operator launches master and worker(s) based on CR spec and observes their
status
+* Operator releases all Spark-cluster owned resources to cluster upon failure
+
+The Operator is built with the [Java Operator
SDK](https://javaoperatorsdk.io/) for
+launching Spark deployments and submitting jobs under the hood. It also uses
+[fabric8](https://fabric8.io/) client to interact with Kubernetes API Server.
+
+## Application State Transition
+
+[<img
src="resources/application_state_machine.png">](resources/application_state_machine.png)
+
+* Spark applications are expected to run from submitted to succeeded before
releasing resources
+* User may configure the app CR to time-out after given threshold of time if
it cannot reach healthy
+ state after given threshold. The timeout can be configured for different
lifecycle stages,
+ when driver starting and when requesting executor pods. To update the
default threshold,
+ configure `.spec.applicationTolerations.applicationTimeoutConfig` for the
application.
+* K8s resources created for an application would be deleted as the final stage
of the application
+ lifecycle by default. This is to ensure resource quota release for completed
applications.
+* It is also possible to retain the created k8s resources for debug or audit
purpose. To do so,
+ user may set `.spec.applicationTolerations.resourceRetainPolicy` to
`OnFailure` to retain
+ resources upon application failure, or set to `Always` to retain resources
regardless of
+ application final state.
+ - This controls the behavior of k8s resources created by Operator for the
application, including
+ driver pod, config map, service, and PVC(if enabled). This does not
apply to resources created
+ by driver (for example, executor pods). User may configure SparkConf to
+ include `spark.kubernetes.executor.deleteOnTermination` for executor
retention. Please refer
+ [Spark
docs](https://spark.apache.org/docs/latest/running-on-kubernetes.html) for
details.
+ - The created k8s resources have `ownerReference` to their related
`SparkApplication` custom
+ resource, such that they could be garbage collected when the
`SparkApplication` is deleted.
+ - Please be advised that k8s resources would not be retained if the
application is configured to
+ restart. This is to avoid resource quota usage increase unexpectedly or
resource conflicts
+ among multiple attempts.
+
+## Cluster State Transition
+
+[<img
src="resources/cluster_state_machine.png">](resources/application_state_machine.png)
+
+* Spark clusters are expected to be always running after submitted.
+* Similar to Spark applications, K8s resources created for a cluster would be
deleted as the final
+ stage of the cluster lifecycle by default.
diff --git a/docs/config_properties.md b/docs/config_properties.md
new file mode 100644
index 0000000..c86ada2
--- /dev/null
+++ b/docs/config_properties.md
@@ -0,0 +1,40 @@
+[//]: # (This doc is automatically generated by gradle task, manual updates
would be overridden.)
+# Spark Operator Config Properties
+ | Key | Type | Default Value | Allow Hot Reloading | Description |
+ | --- | --- | --- | --- | --- |
+ | spark.kubernetes.operator.name | String | spark-kubernetes-operator | false
| Name of the operator. |
+ | spark.kubernetes.operator.namespace | String | default | false | Namespace
that operator is deployed within. |
+ | spark.kubernetes.operator.watchedNamespaces | String | default | true |
Comma-separated list of namespaces that the operator would be watching for
Spark resources. If set to '*', operator would watch all namespaces. |
+ | spark.kubernetes.operator.terminateOnInformerFailureEnabled | Boolean |
false | false | Enable to indicate informer errors should stop operator
startup. If disabled, operator startup will ignore recoverable errors, caused
for example by RBAC issues and will retry periodically. |
+ | spark.kubernetes.operator.reconciler.terminationTimeoutSeconds | Integer |
30 | false | Grace period for operator shutdown before reconciliation threads
are killed. |
+ | spark.kubernetes.operator.reconciler.parallelism | Integer | 50 | false |
Thread pool size for Spark Operator reconcilers. Unbounded pool would be used
if set to non-positive number. |
+ | spark.kubernetes.operator.reconciler.foregroundRequestTimeoutSeconds | Long
| 30 | true | Timeout (in seconds) to for requests made to API server. This
applies only to foreground requests. |
+ | spark.kubernetes.operator.reconciler.intervalSeconds | Long | 120 | true |
Interval (in seconds, non-negative) to reconcile Spark applications. Note that
reconciliation is always expected to be triggered when app spec / status is
updated. This interval controls the reconcile behavior of operator
reconciliation even when there's no update on SparkApplication, e.g. to
determine whether a hanging app needs to be proactively terminated. Thus this
is recommended to set to above 2 minutes t [...]
+ | spark.kubernetes.operator.reconciler.trimStateTransitionHistoryEnabled |
Boolean | true | true | When enabled, operator would trim state transition
history when a new attempt starts, keeping previous attempt summary only. |
+ | spark.kubernetes.operator.reconciler.appStatusListenerClassNames | String |
| false | Comma-separated names of SparkAppStatusListener class
implementations |
+ | spark.kubernetes.operator.reconciler.clusterStatusListenerClassNames |
String | | false | Comma-separated names of SparkClusterStatusListener class
implementations |
+ | spark.kubernetes.operator.dynamicConfig.enabled | Boolean | false | false |
When enabled, operator would use config map as source of truth for config
property override. The config map need to be created in
spark.kubernetes.operator.namespace, and labeled with operator name. |
+ | spark.kubernetes.operator.dynamicConfig.selector | String |
app.kubernetes.io/name=spark-kubernetes-operator,app.kubernetes.io/component=operator-dynamic-config-overrides
| false | The selector str applied to dynamic config map. |
+ | spark.kubernetes.operator.dynamicConfig.reconcilerParallelism | Integer | 1
| false | Parallelism for dynamic config reconciler. Unbounded pool would be
used if set to non-positive number. |
+ | spark.kubernetes.operator.reconciler.rateLimiter.refreshPeriodSeconds |
Integer | 15 | false | Operator rate limiter refresh period(in seconds) for
each resource. |
+ | spark.kubernetes.operator.reconciler.rateLimiter.maxLoopForPeriod | Integer
| 5 | false | Max number of reconcile loops triggered within the rate limiter
refresh period for each resource. Setting the limit <= 0 disables the limiter.
|
+ | spark.kubernetes.operator.reconciler.retry.initialIntervalSeconds | Integer
| 5 | false | Initial interval(in seconds) of retries on unhandled controller
errors. |
+ | spark.kubernetes.operator.reconciler.retry.intervalMultiplier | Double |
1.5 | false | Interval multiplier of retries on unhandled controller errors.
Setting this to 1 for linear retry. |
+ | spark.kubernetes.operator.reconciler.retry.maxIntervalSeconds | Integer |
-1 | false | Max interval(in seconds) of retries on unhandled controller
errors. Set to non-positive for unlimited. |
+ | spark.kubernetes.operator.api.retryMaxAttempts | Integer | 15 | false | Max
attempts of retries on unhandled controller errors. Setting this to
non-positive value means no retry. |
+ | spark.kubernetes.operator.api.retryAttemptAfterSeconds | Long | 1 | false |
Default time (in seconds) to wait till next request. This would be used if
server does not set Retry-After in response. Setting this to non-positive
number means immediate retry. |
+ | spark.kubernetes.operator.api.statusPatchMaxAttempts | Long | 3 | false |
Maximal number of retry attempts of requests to k8s server for resource status
update. This would be performed on top of k8s client
spark.kubernetes.operator.retry.maxAttempts to overcome potential conflicting
update on the same SparkApplication. This should be positive number. |
+ | spark.kubernetes.operator.api.secondaryResourceCreateMaxAttempts | Long | 3
| false | Maximal number of retry attempts of requesting secondary resource for
Spark application. This would be performed on top of k8s client
spark.kubernetes.operator.retry.maxAttempts to overcome potential conflicting
reconcile on the same SparkApplication. This should be positive number |
+ | spark.kubernetes.operator.metrics.josdkMetricsEnabled | Boolean | true |
false | When enabled, the josdk metrics will be added in metrics source and
configured for operator. |
+ | spark.kubernetes.operator.metrics.clientMetricsEnabled | Boolean | true |
false | Enable KubernetesClient metrics for measuring the HTTP traffic to the
Kubernetes API Server. Since the metrics is collected via Okhttp interceptors,
can be disabled when opt in customized interceptors. |
+ | spark.kubernetes.operator.metrics.clientMetricsGroupByResponseCodeEnabled |
Boolean | true | false | When enabled, additional metrics group by http
response code group(1xx, 2xx, 3xx, 4xx, 5xx) received from API server will be
added. Users can disable it when their monitoring system can combine lower
level kubernetes.client.http.response.<3-digit-response-code> metrics. |
+ | spark.kubernetes.operator.metrics.port | Integer | 19090 | false | The port
used for checking metrics |
+ | spark.kubernetes.operator.health.probePort | Integer | 19091 | false | The
port used for health/readiness check probe status. |
+ | spark.kubernetes.operator.health.sentinelExecutorPoolSize | Integer | 3 |
false | Size of executor service in Sentinel Managers to check the health of
sentinel resources. |
+ | spark.kubernetes.operator.health.sentinelResourceReconciliationDelaySeconds
| Integer | 60 | true | Allowed max time(seconds) between spec update and
reconciliation for sentinel resources. |
+ | spark.kubernetes.operator.leaderElection.enabled | Boolean | false | false
| Enable leader election for the operator to allow running standby instances.
When this is disabled, only one operator instance is expected to be up and
running at any time (replica = 1) to avoid race condition. |
+ | spark.kubernetes.operator.leaderElection.leaseName | String |
spark-operator-lease | false | Leader election lease name, must be unique for
leases in the same namespace. |
+ | spark.kubernetes.operator.leaderElection.leaseDurationSeconds | Integer |
180 | false | Leader election lease duration in seconds, non-negative. |
+ | spark.kubernetes.operator.leaderElection.renewDeadlineSeconds | Integer |
120 | false | Leader election renew deadline in seconds, non-negative. This
needs to be smaller than the lease duration to allow current leader renew the
lease before lease expires. |
+ | spark.kubernetes.operator.leaderElection.retryPeriodSeconds | Integer | 5 |
false | Leader election retry period in seconds, non-negative. |
+
diff --git a/docs/configuration.md b/docs/configuration.md
new file mode 100644
index 0000000..bafd3c5
--- /dev/null
+++ b/docs/configuration.md
@@ -0,0 +1,144 @@
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements. See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership. The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied. See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# Configuration
+
+## Configure Operator
+
+Spark Operator supports different ways to configure the behavior:
+
+* **spark-operator.properties** provided when deploying the operator. In
addition to the
+ [property
file](../build-tools/helm/spark-kubernetes-operator/conf/spark-operator.
+ properties), it is also possible to override or append config properties in
helm [Values
+ files](../build-tools/helm/spark-kubernetes-operator/values.yaml).
+* **System Properties** : when provided as system properties (e.g. via -D
options to the
+ operator JVM), it overrides the values provided in property file.
+* **Hot property loading** : when enabled, a
+ [configmap](https://kubernetes.io/docs/concepts/configuration/configmap/)
would be created with
+ the operator in the same namespace. Operator can monitor updates performed
on the configmap. Hot
+ properties reloading takes higher precedence comparing with default
properties override.
+ - An example use case: operator use hot properties to figure the list of
namespace(s) to
+ operate Spark applications. The hot properties config map can be updated
and
+ maintained by user or additional microservice to tune the operator
behavior without
+ rebooting it.
+ - Please be advised that not all properties can be hot-loaded and honored
at runtime.
+ Refer the list of [supported properties](./config_properties.md) for
more details.
+
+To enable hot properties loading, update the **helm chart values file** with
+
+```
+operatorConfiguration:
+ spark-operator.properties: |+
+ spark.operator.dynamic.config.enabled=true
+ # ... all other config overides...
+ dynamicConfig:
+ create: true
+```
+
+## Metrics
+
+Spark operator,
+following [Apache
Spark](https://spark.apache.org/docs/latest/monitoring.html#metrics),
+has a configurable metrics system based on
+the [Dropwizard Metrics Library](https://metrics.dropwizard.io/4.2.25/). Note
that Spark Operator
+does not have Spark UI, MetricsServlet
+and PrometheusServlet from org.apache.spark.metrics.sink package are not
supported. If you are
+interested in Prometheus metrics exporting, please take a look at below
+section [Forward Metrics to Prometheus](#Forward-Metrics-to-Prometheus)
+
+### JVM Metrics
+
+Spark Operator collects JVM metrics
+via [Codahale JVM
Metrics](https://javadoc.io/doc/com.codahale.metrics/metrics-jvm/latest/index.html)
+
+- BufferPoolMetricSet
+- FileDescriptorRatioGauge
+- GarbageCollectorMetricSet
+- MemoryUsageGaugeSet
+- ThreadStatesGaugeSet
+
+### Kubernetes Client Metrics
+
+| Metrics Name | Type |
Description
|
+|-----------------------------------------------------------|------------|--------------------------------------------------------------------------------------------------------------------------|
+| kubernetes.client.http.request | Meter |
Tracking the rates of HTTP request sent to the Kubernetes API Server
|
+| kubernetes.client.http.response | Meter |
Tracking the rates of HTTP response from the Kubernetes API Server
|
+| kubernetes.client.http.response.failed | Meter |
Tracking the rates of HTTP requests which have no response from the Kubernetes
API Server |
+| kubernetes.client.http.response.latency.nanos | Histograms |
Measures the statistical distribution of HTTP response latency from the
Kubernetes API Server |
+| kubernetes.client.http.response.<ResponseCode> | Meter |
Tracking the rates of HTTP response based on response code from the Kubernetes
API Server |
+| kubernetes.client.http.request.<RequestMethod> | Meter |
Tracking the rates of HTTP request based type of method to the Kubernetes API
Server |
+| kubernetes.client.http.response.1xx | Meter |
Tracking the rates of HTTP Code 1xx responses (informational) received from the
Kubernetes API Server per response code. |
+| kubernetes.client.http.response.2xx | Meter |
Tracking the rates of HTTP Code 2xx responses (success) received from the
Kubernetes API Server per response code. |
+| kubernetes.client.http.response.3xx | Meter |
Tracking the rates of HTTP Code 3xx responses (redirection) received from the
Kubernetes API Server per response code. |
+| kubernetes.client.http.response.4xx | Meter |
Tracking the rates of HTTP Code 4xx responses (client error) received from the
Kubernetes API Server per response code. |
+| kubernetes.client.http.response.5xx | Meter |
Tracking the rates of HTTP Code 5xx responses (server error) received from the
Kubernetes API Server per response code. |
+| kubernetes.client.<ResourceName>.<Method> | Meter |
Tracking the rates of HTTP request for a combination of one Kubernetes resource
and one http method |
+| kubernetes.client.<NamespaceName>.<ResourceName>.<Method> | Meter |
Tracking the rates of HTTP request for a combination of one namespace-scoped
Kubernetes resource and one http method |
+
+### Forward Metrics to Prometheus
+
+In this section, we will show you how to forward Spark Operator metrics
+to [Prometheus](https://prometheus.io).
+
+* Modify the metrics properties section in the file
+ `build-tools/helm/spark-kubernetes-operator/values.yaml`:
+
+```properties
+metrics.properties:|+
+
spark.metrics.conf.operator.sink.prometheus.class=org.apache.spark.kubernetes.operator.metrics.
+sink.PrometheusPullModelSink
+```
+
+* Install Spark Operator
+
+```bash
+helm install spark-kubernetes-operator -f
build-tools/helm/spark-kubernetes-operator/values.yaml
build-tools/helm/spark-kubernetes-operator/
+```
+
+* Install Prometheus via Helm Chart
+
+```bash
+helm repo add prometheus-community
https://prometheus-community.github.io/helm-charts
+helm install prometheus prometheus-community/prometheus
+```
+
+* Find and Annotate Spark Operator Pods
+
+```bash
+kubectl get pods -l app.kubernetes.io/name=spark-kubernetes-operator
+NAME READY STATUS RESTARTS AGE
+spark-kubernetes-operator-598cb5d569-bvvd2 1/1 Running 0 24m
+
+kubectl annotate pods spark-kubernetes-operator-598cb5d569-bvvd2
prometheus.io/scrape=true
+kubectl annotate pods spark-kubernetes-operator-598cb5d569-bvvd2
prometheus.io/path=/prometheus
+kubectl annotate pods spark-kubernetes-operator-598cb5d569-bvvd2
prometheus.io/port=19090
+```
+
+* Check Metrics via Prometheus UI
+
+```bash
+kubectl get pods | grep "prometheus-server"
+prometheus-server-654bc74fc9-8hgkb 2/2 Running 0
59m
+
+kubectl port-forward --address 0.0.0.0 pod/prometheus-server-654bc74fc9-8hgkb
8080:9090
+```
+
+open your browser with address `localhost:8080`. Click on Status Targets tab,
you should be able
+to find target as below.
+[<img src="resources/prometheus.png">](resources/prometheus.png)
diff --git a/docs/operations.md b/docs/operations.md
new file mode 100644
index 0000000..84fac03
--- /dev/null
+++ b/docs/operations.md
@@ -0,0 +1,151 @@
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements. See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership. The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied. See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+### Compatibility
+
+- Java 17 and 21
+- Kubernetes version compatibility:
+ + k8s version >= 1.28 is recommended. Operator attempts to be API
compatible as possible, but
+ patch support will not be performed on k8s versions that reached EOL.
+- Spark versions 3.5 or above.
+
+### Spark Application Namespaces
+
+By default, Spark applications are created in the same namespace as the
operator deployment.
+You many also configure the chart deployment to add necessary RBAC resources
for
+applications to enable them running in additional namespaces.
+
+## Overriding configuration parameters during Helm install
+
+Helm provides different ways to override the default installation parameters
(contained
+in `values.yaml`) for the Helm chart.
+
+To override single parameters you can use `--set`, for example:
+
+```
+helm install --set image.repository=<my_registory>/spark-kubernetes-operator \
+ -f build-tools/helm/spark-kubernetes-operator/values.yaml \
+ build-tools/helm/spark-kubernetes-operator/
+```
+
+You can also provide multiple custom values file by using the `-f` flag, the
latest takes
+higher precedence:
+
+```
+helm install spark-kubernetes-operator \
+ -f build-tools/helm/spark-kubernetes-operator/values.yaml \
+ -f my_values.yaml \
+ build-tools/helm/spark-kubernetes-operator/
+```
+
+The configurable parameters of the Helm chart and which default values as
detailed in the
+following table:
+
+| Parameters | Description
| Default value
|
+|---------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------|
+| image.repository | The image
repository of spark-kubernetes-operator.
| spark-kubernetes-operator
|
+| image.pullPolicy | The image
pull policy of spark-kubernetes-operator.
| IfNotPresent
|
+| image.tag | The image
tag of spark-kubernetes-operator.
| 0.1.0-SNAPSHOT
|
+| image.digest | The image
digest of spark-kubernetes-operator. If set then it takes precedence and the
image tag will be ignored.
|
|
+| imagePullSecrets | The image
pull secrets of spark-kubernetes-operator.
|
|
+| operatorDeployment.replica | Operator
replica count. Must be 1 unless leader election is configured.
| 1
|
+| operatorDeployment.strategy.type | Operator pod
upgrade strategy. Must be Recreate unless leader election is configured.
| Recreate
|
+| operatorDeployment.operatorPod.annotations | Custom
annotations to be added to the operator pod
|
|
+| operatorDeployment.operatorPod.labels | Custom
labels to be added to the operator pod
|
|
+| operatorDeployment.operatorPod.nodeSelector | Custom
nodeSelector to be added to the operator pod.
|
|
+| operatorDeployment.operatorPod.topologySpreadConstraints | Custom
topologySpreadConstraints to be added to the operator pod.
|
|
+| operatorDeployment.operatorPod.dnsConfig | DNS
configuration to be used by the operator pod.
|
|
+| operatorDeployment.operatorPod.volumes | Additional
volumes to be added to the operator pod.
|
|
+| operatorDeployment.operatorPod.priorityClassName | Priority
class name to be used for the operator pod
|
|
+| operatorDeployment.operatorPod.securityContext | Security
context overrides for the operator pod
|
|
+| operatorDeployment.operatorContainer.jvmArgs | JVM arg
override for the operator container.
| `"-Dfile.encoding=UTF8"`
|
+| operatorDeployment.operatorContainer.env | Custom env
to be added to the operator container.
|
|
+| operatorDeployment.operatorContainer.envFrom | Custom
envFrom to be added to the operator container, e.g. for downward API.
|
|
+| operatorDeployment.operatorContainer.probes | Probe config
for the operator container.
|
|
+| operatorDeployment.operatorContainer.securityContext | Security
context overrides for the operator container.
| run as non root for baseline secuirty standard compliance
|
+| operatorDeployment.operatorContainer.resources | Resources
for the operator container.
| memory 4Gi, ephemeral storage 2Gi and 1 cpu
|
+| operatorDeployment.additionalContainers | Additional
containers to be added to the operator pod, e.g. sidecar.
|
|
+| operatorRbac.serviceAccount.create | Whether to
create service account for operator to use.
| true
|
+| operatorRbac.serviceAccount.name | Name of the
operator Role.
| `"spark-operator"`
|
+| operatorRbac.clusterRole.create | Whether to
create ClusterRole for operator to use.
| true
|
+| operatorRbac.clusterRole.name | Name of the
operator ClusterRole.
| `"spark-operator-clusterrole"`
|
+| operatorRbac.clusterRoleBinding.create | Whether to
create ClusterRoleBinding for operator to use.
| true
|
+| operatorRbac.clusterRoleBinding.name | Name of the
operator ClusterRoleBinding.
| `"spark-operator-clusterrolebinding"`
|
+| operatorRbac.role.create | Whether to
create Role for operator to use in each workload namespace(s). At least one of
`clusterRole.create` or `role.create` should be enabled
| false
|
+| operatorRbac.role.name | Name of the
operator Role
| `"spark-operator-role"`
|
+| operatorRbac.roleBinding.create | Whether to
create RoleBinding for operator to use. At least one of
`clusterRoleBinding.create` or `roleBinding.create` should be enabled
| false
|
+| operatorRbac.roleBinding.name | Name of the
operator RoleBinding in each workload namespace(s).
| `"spark-operator-rolebinding"`
|
+| operatorRbac.roleBinding.roleRef | RoleRef for
the created Operator RoleBinding. Override this when you want the created
RoleBinding refer to ClusterRole / Role that's different from the default
operator Role. | Refers to default `operatorRbac.role.name`
|
+| operatorRbac.configManagement.create | Enable this
to create a Role for operator configuration management (hot property loading
and leader election).
| true
|
+| operatorRbac.configManagement.roleName | Role name
for operator configuration management.
| `spark-operator-config-role`
|
+| operatorRbac.configManagement.roleBinding | RoleBinding
name for operator configuration management.
| `"spark-operator-config-monitor-role-binding"`
|
+| operatorRbac.labels | Labels to be
applied on all created `operatorRbac` resources.
| `"app.kubernetes.io/component": "operator-rbac"`
|
+| workloadResources.namespaces.create | Whether to
create dedicated namespaces for Spark workload.
| true
|
+| workloadResources.namespaces.overrideWatchedNamespaces | When
enabled, operator would by default only watch namespace(s) provided in data
field.
| true
|
+| workloadResources.namespaces.data | List of
namespaces to create for Spark workload. The chart namespace would be used if
this is empty.
|
|
+| workloadResources.clusterRole.create | When
enabled, a ClusterRole would be created for Spark workload to use.
| true
|
+| workloadResources.clusterRole.name | Name of the
Spark workload ClusterRole.
| "spark-workload-clusterrole"
|
+| workloadResources.role.create | When
enabled, a Role would be created in each namespace for Spark workload. At least
one of `clusterRole.create` or `role.create` should be enabled.
| false
|
+| workloadResources.role.name | Name for
Spark workload Role.
| "spark-workload-role"
|
+| workloadResources.roleBinding.create | When
enabled, a RoleBinding would be created in each namespace for Spark workload.
This shall be enabled unless access is configured from 3rd party.
| true
|
+| workloadResources.roleBinding.name | Name of the
Spark workload RoleBinding.
| "spark-workload-rolebinding"
|
+| workloadResources.serviceAccounts.create | Whether to
create a service account for Spark workload.
| true
|
+| workloadResources.serviceAccounts.name | The name of
Spark workload service account.
| `spark`
|
+| workloadResources.labels | Labels to be
applied for all workload resources.
| `"app.kubernetes.io/component": "spark-workload"`
|
+| workloadResources.annotations | Annotations
to be applied for all workload resources.
| `"helm.sh/resource-policy": keep`
|
+| workloadResources.sparkApplicationSentinel.create | If enabled,
sentinel resources will be created for operator to watch and reconcile for the
health probe purpose.
| false
|
+| workloadResources.sparkApplicationSentinel.sentinelNamespaces | A list of
namespaces where sentinel resources will be created in. Note that these
namespaces have to be a subset of `workloadResources.namespaces.data`.
|
|
+| operatorConfiguration.append | If set to
true, below conf file & properties would be appended to default conf.
Otherwise, they would override default properties.
| true
|
+| operatorConfiguration.log4j2.properties | The default
log4j2 configuration.
| Refer default
[log4j2.properties](../build-tools/helm/spark-kubernetes-operator/conf/log4j2.properties)
|
+| operatorConfiguration.spark-operator.properties | The default
operator configuration.
|
|
+| operatorConfiguration.metrics.properties | The default
operator metrics (sink) configuration.
|
|
+| operatorConfiguration.dynamicConfig.create | If set to
true, a config map would be created & watched by operator as source of truth
for hot properties loading.
| false
|
+| operatorConfiguration.dynamicConfig.enable | If set to
true, operator would honor the created config mapas source of truth for hot
properties loading.
| false
|
+| operatorConfiguration.dynamicConfig.annotations | Annotations
to be applied for the dynamicConfig resources.
| `"helm.sh/resource-policy": keep`
|
+| operatorConfiguration.dynamicConfig.data | Data field
(key-value pairs) that acts as hot properties in the config map.
| `spark.kubernetes.operator.reconciler.intervalSeconds: "60"`
|
+
+For more information check the [Helm
documentation](https://helm.sh/docs/helm/helm_install/).
+
+__Notice__: The pod resources should be set as your workload in different
environments to
+archive a matched K8s pod QoS. See
+also [Pod Quality of Service
Classes](https://kubernetes.io/docs/concepts/workloads/pods/pod-qos/#quality-of-service-classes).
+
+## Operator Health(Liveness) Probe with Sentinel Resource
+
+Learning
+from [Apache Flink
Operator](https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-main/docs/operations/health/#canary-resources),
+a dummy spark application resource in any watched namespace can help Spark
operator health
+probe monitor.
+
+Here is a Spark Sentinel resource example with the label
`"spark.operator/sentinel": "true"`
+and it will not result in creation of any other kubernetes resources.
Controlled by
+property `health.sentinel.resource.reconciliation.delay.seconds`, by default,
the timeout to
+reconcile the sentinel resources is 60 seconds. If the operator cannot
reconcile these
+resources within limited time, the operator health probe will return HTTP code
500 when kubelet
+send the HTTP Get to the liveness endpoint, and the
+kubelet will then kill the spark operator container and restart it.
+
+```yaml
+apiVersion: org.apache.spark/v1alpha1
+kind: SparkApplication
+metadata:
+ name: spark-sentinel-resources
+ labels:
+ "spark.operator/sentinel": "true"
+```
diff --git a/docs/resources/application_state_machine.png
b/docs/resources/application_state_machine.png
new file mode 100644
index 0000000..3b3df3d
Binary files /dev/null and b/docs/resources/application_state_machine.png differ
diff --git a/docs/resources/cluster_state_machine.png
b/docs/resources/cluster_state_machine.png
new file mode 100644
index 0000000..2a8dcdd
Binary files /dev/null and b/docs/resources/cluster_state_machine.png differ
diff --git a/docs/resources/prometheus.png b/docs/resources/prometheus.png
new file mode 100644
index 0000000..5507d57
Binary files /dev/null and b/docs/resources/prometheus.png differ
diff --git a/docs/spark_custom_resources.md b/docs/spark_custom_resources.md
new file mode 100644
index 0000000..aef3377
--- /dev/null
+++ b/docs/spark_custom_resources.md
@@ -0,0 +1,227 @@
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements. See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership. The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied. See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+## Spark Operator API
+
+The core user facing API of the Spark Kubernetes Operator is the
`SparkApplication` and
+`SparkCluster` Custom Resources Definition (CRD). Spark custom resource
extends
+standard k8s API, defines Spark Application spec and tracks status.
+
+Once the Spark Operator is installed and running in your Kubernetes
environment, it will
+continuously watch SparkApplication(s) and SparkCluster(s) submitted, via k8s
API client or
+kubectl by the user, orchestrate secondary resources (pods, configmaps .etc).
+
+Please check out the [quickstart](../README.md) as well for installing
operator.
+
+## SparkApplication
+
+SparkApplication can be defined in YAML format. User may configure the
application entrypoint
+and configurations. Let's start with the [Spark-Pi
example](../examples/pi.yaml):
+
+```yaml
+apiVersion: spark.apache.org/v1alpha1
+kind: SparkApplication
+metadata:
+ name: pi
+spec:
+ # Entry point for the app
+ mainClass: "org.apache.spark.examples.SparkPi"
+ jars: "local:///opt/spark/examples/jars/spark-examples.jar"
+ sparkConf:
+ spark.dynamicAllocation.enabled: "true"
+ spark.dynamicAllocation.shuffleTracking.enabled: "true"
+ spark.dynamicAllocation.maxExecutors: "3"
+ spark.log.structuredLogging.enabled: "false"
+ spark.kubernetes.authenticate.driver.serviceAccountName: "spark"
+ spark.kubernetes.container.image: "apache/spark:4.0.0-preview2"
+ applicationTolerations:
+ resourceRetainPolicy: OnFailure
+ runtimeVersions:
+ scalaVersion: "2.13"
+ sparkVersion: "4.0.0-preview2"
+```
+
+After application is submitted, Operator will add status information to your
application based on
+the observed state:
+
+```
+kubectl get sparkapp pi -o yaml
+```
+
+### Write and build your SparkApplication
+
+It's straightforward to convert your spark-submit application to
`SparkApplication` yaml.
+Operators constructs driver spec in the similar approach. To submit Java /
scala application,
+use `.spec.jars` and `.spec.mainClass`. Similarly, set `pyFiles` for Python
applications.
+
+While building images to use by driver and executor, it's recommended to use
official
+[Spark Docker](https://github.com/apache/spark-docker) as base images. Check
the pod template
+support (`.spec.driverSpec.podTemplateSpec` and
`.spec.executorSpec.podTemplateSpec`) as well for
+setting custom Spark home and work dir.
+
+### Pod Template Support
+
+It is possible to configure pod template for driver & executor pods for
configure spec that are
+not configurable from SparkConf.
+
+Spark Operator supports defining pod template for driver and executor pods in
two ways:
+
+1. Set `PodTemplateSpec` in `SparkApplication`
+2. Config `spark.kubernetes.[driver/executor].podTemplateFile`
+
+If pod template spec is set in application spec (option 1), it would take
higher precedence
+than option 2. Also `spark.kubernetes.[driver/executor].podTemplateFile` would
be unset to
+avoid multiple override.
+
+When pod template is set as remote file in conf properties (option 2), please
ensure Spark
+Operator has necessary permission to access the remote file location, e.g.
deploy operator
+with proper workload identity with target S3 / Cloud Storage bucket access.
Similar permission
+requirements are also needed driver pod: operator needs template file access
to create driver,
+and driver needs the same for creating executors.
+
+Please be advised that Spark still overrides necessary pod configuration in
both options. For
+more details,
+refer [Spark
doc](https://spark.apache.org/docs/latest/running-on-kubernetes.html#pod-template).
+
+## Understanding Failure Types
+
+In addition to the general `Failed` state (that driver pod fails or driver
container exits
+with non-zero code), Spark Operator introduces a few different failure state
for ease of
+app status monitoring at high level, and for ease of setting up different
handlers if users
+are creating / managing SparkApplications with external microservices or
workflow engines.
+
+
+Spark Operator recognizes "infrastructure failure" in the best effort way. It
is possible to
+configure different restart policy on general failure(s) vs. on potential
infrastructure
+failure(s). For example, you may configure the app to restart only upon
infrastructure
+failures. If Spark application fails as a result of
+
+```
+DriverStartTimedOut
+ExecutorsStartTimedOut
+SchedulingFailure
+```
+
+It is more likely that the app failed as a result of infrastructure reason(s),
including
+scenarios like driver or executors cannot be scheduled or cannot initialize in
configured
+time window for scheduler reasons, as a result of insufficient capacity,
cannot get IP
+allocated, cannot pull images, or k8s API server issue at scheduling .etc.
+
+Please be advised that this is a best-effort failure identification. You may
still need to
+debug actual failure from the driver pods. Spark Operator would stage the last
observed
+driver pod status with the stopping state for audit purposes.
+
+## Configure the Tolerations for SparkApplication
+
+### Restart
+
+Spark Operator enables configure app restart behavior for different failure
types. Here's a
+sample restart config snippet:
+
+``` yaml
+restartConfig:
+ # accptable values are 'Never', 'Always', 'OnFailure' and
'OnInfrastructureFailure'
+ restartPolicy: Never
+ # operator would retry the application if configured. All resources from
current attepmt
+ # would be deleted before starting next attempt
+ maxRestartAttempts: 3
+ # backoff time (in millis) that operator would wait before next attempt
+ restartBackoffMillis: 30000
+```
+
+### Timeouts
+
+It's possible to configure applications to be proactively terminated and
resubmitted in particular
+cases to avoid resource deadlock.
+
+
+| Field
| Type | Default Value | Descritpion
|
+|-----------------------------------------------------------------------------------------|---------|---------------|--------------------------------------------------------------------------------------------------------------------|
+|
.spec.applicationTolerations.applicationTimeoutConfig.driverStartTimeoutMillis
| integer | 300000 | Time to wait for driver reaches running
state after requested driver. |
+|
.spec.applicationTolerations.applicationTimeoutConfig.executorStartTimeoutMillis
| integer | 300000 | Time to wait for driver to acquire minimal
number of running executors. |
+|
.spec.applicationTolerations.applicationTimeoutConfig.forceTerminationGracePeriodMillis
| integer | 300000 | Time to wait for force delete resources at the end
of attempt. |
+|
.spec.applicationTolerations.applicationTimeoutConfig.driverReadyTimeoutMillis
| integer | 300000 | Time to wait for driver reaches ready
state. |
+|
.spec.applicationTolerations.applicationTimeoutConfig.terminationRequeuePeriodMillis
| integer | 2000 | Back-off time when releasing resource need to
be re-attempted for application. |
+
+
+### Instance Config
+
+Instance Config helps operator to decide whether an application is running
healthy. When
+the underlying cluster has batch scheduler enabled, you may configure the apps
to be
+started if and only if there are sufficient resources. If, however, the
cluster does not
+have a batch scheduler, operator may help avoid app hanging with
`InstanceConfig` that
+describes the bare minimal tolerable scenario.
+
+For example, with below spec:
+
+```yaml
+applicationTolerations:
+ instanceConfig:
+ minExecutors: 3
+ initExecutors: 5
+ maxExecutors: 10
+sparkConf:
+ spark.executor.instances: "10"
+```
+
+Spark would try to bring up 10 executors as defined in SparkConf. In addition,
from
+operator perspective,
+
+* If Spark app acquires less than 5 executors in given tine window (.spec.
+ applicationTolerations.applicationTimeoutConfig.executorStartTimeoutMillis)
after
+ submitted, it would be shut down proactively in order to avoid resource
deadlock.
+* Spark app would be marked as 'RunningWithBelowThresholdExecutors' if it
loses executors after
+ successfully start up.
+* Spark app would be marked as 'RunningHealthy' if it has at least min
executors after
+ successfully started up.
+
+### Delete Resources On Termination
+
+Operator by default would delete all created resources at the end of an
attempt. It would
+try to record the last observed driver status in `status` field of the
application for
+troubleshooting purpose.
+
+On the other hand, when developing an application, it's possible to configure
+
+```yaml
+applicationTolerations:
+ # Acceptable values are 'Always', 'OnFailure', 'Never'
+ resourceRetentionPolicy: OnFailure
+```
+
+to avoid operator attempt to delete driver pod and driver resources if app
fails. Similarly,
+if resourceRetentionPolicy is set to `Always`, operator would not delete
driver resources
+when app ends. Note that this applies only to operator-created resources
(driver pod, SparkConf
+configmap .etc). You may also want to tune
`spark.kubernetes.driver.service.deleteOnTermination`
+and `spark.kubernetes.executor.deleteOnTermination` to control the behavior of
driver-created
+resources.
+
+## Spark Cluster
+
+Spark Operator also supports launching Spark clusters in k8s via
`SparkCluster` custom resource,
+which takes minimal effort to specify desired master and worker instances spec.
+
+To deploy a Spark cluster, you may start with specifying the desired Spark
version, worker count as
+well as the SparkConf as in the
[example](../examples/qa-cluster-with-one-worker.yaml). Master &
+worker instances would be deployed as
[StatefulSets](https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/)
+and exposed via k8s
[service(s)](https://kubernetes.io/docs/concepts/services-networking/service/).
+
+Like Pod Template Support for Applications, it's also possible to submit
template(s) for the Spark
+instances for `SparkCluster` to configure spec that's not supported via
SparkConf. It's worth notice
+that Spark may overwrite certain fields.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]