This is an automated email from the ASF dual-hosted git repository.
amoghrajesh pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/airflow.git
The following commit(s) were added to refs/heads/main by this push:
new 1d3fcb0b1e4 Document REST scheme and port connection fields for Spark
(#67682)
1d3fcb0b1e4 is described below
commit 1d3fcb0b1e4609447ab3ee9d39739d5d09e0aa05
Author: Amogh Desai <[email protected]>
AuthorDate: Fri May 29 14:04:22 2026 +0530
Document REST scheme and port connection fields for Spark (#67682)
---
.pre-commit-config.yaml | 1 +
providers/apache/spark/docs/connections/spark-submit.rst | 9 +++++++++
providers/apache/spark/docs/operators.rst | 12 ++++++++++++
3 files changed, 22 insertions(+)
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
index 42ab2035ec0..c5d02ea0a84 100644
--- a/.pre-commit-config.yaml
+++ b/.pre-commit-config.yaml
@@ -712,6 +712,7 @@ repos:
^providers/apache/kafka/docs/connections/kafka\.rst$|
^providers/apache/spark/docs/decorators/pyspark\.rst$|
^providers/apache/spark/docs/connections/spark-submit.rst$|
+ ^providers/apache/spark/docs/operators\.rst$|
^providers/apache/spark/src/airflow/providers/apache/spark/decorators/|
^providers/apache/spark/src/airflow/providers/apache/spark/hooks/|
^providers/apache/spark/src/airflow/providers/apache/spark/operators/|
diff --git a/providers/apache/spark/docs/connections/spark-submit.rst
b/providers/apache/spark/docs/connections/spark-submit.rst
index 498796d016b..28c45e4c689 100644
--- a/providers/apache/spark/docs/connections/spark-submit.rst
+++ b/providers/apache/spark/docs/connections/spark-submit.rst
@@ -49,6 +49,15 @@ Spark binary (optional)
Kubernetes namespace (optional, only applies to spark on kubernetes
applications)
Kubernetes namespace (``spark.kubernetes.namespace``) to divide cluster
resources between multiple users (via resource quota).
+REST scheme (optional, only applies to Spark standalone cluster mode)
+ Scheme used to reach the Spark standalone REST API (``http`` or
``https``). Defaults to ``http``.
+ Set to ``https`` when the Spark master REST API is TLS-enabled
+ (``spark.ssl.standalone.enabled=true``).
+
+REST port (optional, only applies to Spark standalone cluster mode)
+ Port of the Spark standalone REST API (``spark.master.rest.port``).
Defaults to ``6066``.
+ Override when your cluster uses a non-default REST port.
+
.. note::
When specifying the connection in environment variable you should specify
diff --git a/providers/apache/spark/docs/operators.rst
b/providers/apache/spark/docs/operators.rst
index 0a645542323..64af53454f4 100644
--- a/providers/apache/spark/docs/operators.rst
+++ b/providers/apache/spark/docs/operators.rst
@@ -199,6 +199,18 @@ a crash-safety net for teams running sync operators for
log observability, org c
because a Triggerer is not available. Teams with a Triggerer available may
also consider
deferrable operators, which free the worker slot but may come with added
complexity.
+**Connection requirements for crash recovery**
+
+The reconnection polling calls the Spark standalone REST API
+(``GET /v1/submissions/status/{driverId}``). Make sure the Spark connection's
+``REST scheme`` and ``REST port`` extras match your cluster's configuration:
+
+* ``REST scheme`` — set to ``https`` if your cluster has TLS enabled on the
REST port
+ (``spark.ssl.standalone.enabled=true``). Defaults to ``http``.
+* ``REST port`` — set to the value of ``spark.master.rest.port`` on your
cluster. Defaults to ``6066``.
+
+See :doc:`connections/spark-submit` for how to configure these fields.
+
.. note::
Crash recovery in cluster mode requires Airflow 3.3+ (``task_state``
support). On earlier
versions the operator falls back to the previous behavior of always
submitting fresh.