This is an automated email from the ASF dual-hosted git repository.
dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push:
new 032dcf89c19b [SPARK-53926][DOCS] Document newly added `core` module
configurations
032dcf89c19b is described below
commit 032dcf89c19bf05c550b90edd9491f3f0a756523
Author: Dongjoon Hyun <[email protected]>
AuthorDate: Wed Oct 15 19:02:34 2025 -0700
[SPARK-53926][DOCS] Document newly added `core` module configurations
### What changes were proposed in this pull request?
This PR aims to document newly added `core` module configurations as a part
of Apache Spark 4.1.0 preparation.
### Why are the changes needed?
To help the users use new features easily.
- https://github.com/apache/spark/pull/47856
- https://github.com/apache/spark/pull/51130
- https://github.com/apache/spark/pull/51163
- https://github.com/apache/spark/pull/51604
- https://github.com/apache/spark/pull/51630
- https://github.com/apache/spark/pull/51708
- https://github.com/apache/spark/pull/51885
- https://github.com/apache/spark/pull/52091
- https://github.com/apache/spark/pull/52382
### Does this PR introduce _any_ user-facing change?
No behavior change because this is a documentation update.
### How was this patch tested?
Manual review.
### Was this patch authored or co-authored using generative AI tooling?
No.
Closes #52626 from dongjoon-hyun/SPARK-53926.
Authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
---
docs/configuration.md | 109 ++++++++++++++++++++++++++++++++++++++++++++++++++
docs/monitoring.md | 8 ++++
2 files changed, 117 insertions(+)
diff --git a/docs/configuration.md b/docs/configuration.md
index b999a6ee2577..e9dbfa2b4f03 100644
--- a/docs/configuration.md
+++ b/docs/configuration.md
@@ -523,6 +523,16 @@ of the most common options to set are:
</td>
<td>3.0.0</td>
</tr>
+<tr>
+ <td><code>spark.driver.log.redirectConsoleOutputs</code></td>
+ <td>stdout,stderr</td>
+ <td>
+ Comma-separated list of the console output kind for driver that needs to
redirect
+ to logging system. Supported values are `stdout`, `stderr`. It only takes
affect when
+ `spark.plugins` is configured with
`org.apache.spark.deploy.RedirectConsolePlugin`.
+ </td>
+ <td>4.1.0</td>
+</tr>
<tr>
<td><code>spark.decommission.enabled</code></td>
<td>false</td>
@@ -772,6 +782,16 @@ Apart from these, the following properties are also
available, and may be useful
</td>
<td>1.1.0</td>
</tr>
+<tr>
+ <td><code>spark.executor.logs.redirectConsoleOutputs</code></td>
+ <td>stdout,stderr</td>
+ <td>
+ Comma-separated list of the console output kind for executor that needs to
redirect
+ to logging system. Supported values are `stdout`, `stderr`. It only takes
affect when
+ `spark.plugins` is configured with
`org.apache.spark.deploy.RedirectConsolePlugin`.
+ </td>
+ <td>4.1.0</td>
+</tr>
<tr>
<td><code>spark.executor.userClassPathFirst</code></td>
<td>false</td>
@@ -857,6 +877,47 @@ Apart from these, the following properties are also
available, and may be useful
</td>
<td>1.2.0</td>
</tr>
+<tr>
+ <td><code>spark.python.factory.idleWorkerMaxPoolSize</code></td>
+ <td>(none)</td>
+ <td>
+ Maximum number of idle Python workers to keep. If unset, the number is
unbounded.
+ If set to a positive integer N, at most N idle workers are retained;
+ least-recently used workers are evicted first.
+ </td>
+ <td>4.1.0</td>
+</tr>
+<tr>
+ <td><code>spark.python.worker.killOnIdleTimeout</code></td>
+ <td>false</td>
+ <td>
+ Whether Spark should terminate the Python worker process when the idle
timeout
+ (as defined by <code>spark.python.worker.idleTimeoutSeconds</code>) is
reached. If enabled,
+ Spark will terminate the Python worker process in addition to logging the
status.
+ </td>
+ <td>4.1.0</td>
+</tr>
+<tr>
+ <td><code>spark.python.worker.tracebackDumpIntervalSeconds</code></td>
+ <td>0</td>
+ <td>
+ The interval (in seconds) for Python workers to dump their tracebacks.
+ If it's positive, the Python worker will periodically dump the traceback
into
+ its `stderr`. The default is `0` that means it is disabled.
+ </td>
+ <td>4.1.0</td>
+</tr>
+<tr>
+ <td><code>spark.python.unix.domain.socket.enabled</code></td>
+ <td>false</td>
+ <td>
+ When set to true, the Python driver uses a Unix domain socket for
operations like
+ creating or collecting a DataFrame from local data, using accumulators,
and executing
+ Python functions with PySpark such as Python UDFs. This configuration only
applies
+ to Spark Classic and Spark Connect server.
+ </td>
+ <td>4.1.0</td>
+</tr>
<tr>
<td><code>spark.files</code></td>
<td></td>
@@ -873,6 +934,16 @@ Apart from these, the following properties are also
available, and may be useful
</td>
<td>1.0.1</td>
</tr>
+<tr>
+ <td><code>spark.submit.callSystemExitOnMainExit</code></td>
+ <td>false</td>
+ <td>
+ If true, SparkSubmit will call System.exit() to initiate JVM shutdown once
the
+ user's main method has exited. This can be useful in cases where
non-daemon JVM
+ threads might otherwise prevent the JVM from shutting down on its own.
+ </td>
+ <td>4.1.0</td>
+</tr>
<tr>
<td><code>spark.jars</code></td>
<td></td>
@@ -1431,6 +1502,14 @@ Apart from these, the following properties are also
available, and may be useful
</td>
<td>3.0.0</td>
</tr>
+<tr>
+ <td><code>spark.eventLog.excludedPatterns</code></td>
+ <td>(none)</td>
+ <td>
+ Specifies comma-separated event names to be excluded from the event logs.
+ </td>
+ <td>4.1.0</td>
+</tr>
<tr>
<td><code>spark.eventLog.dir</code></td>
<td>file:///tmp/spark-events</td>
@@ -1905,6 +1984,15 @@ Apart from these, the following properties are also
available, and may be useful
</td>
<td>3.2.0</td>
</tr>
+<tr>
+ <td><code>spark.io.compression.zstd.strategy</code></td>
+ <td>(none)</td>
+ <td>
+ Compression strategy for Zstd compression codec. The higher the value is,
the more
+ complex it becomes, usually resulting stronger but slower compression or
higher CPU cost.
+ </td>
+ <td>4.1.0</td>
+</tr>
<tr>
<td><code>spark.io.compression.zstd.workers</code></td>
<td>0</td>
@@ -2092,6 +2180,17 @@ Apart from these, the following properties are also
available, and may be useful
</td>
<td>1.6.0</td>
</tr>
+<tr>
+ <td><code>spark.memory.unmanagedMemoryPollingInterval</code></td>
+ <td>0s</td>
+ <td>
+ Interval for polling unmanaged memory users to track their memory usage.
+ Unmanaged memory users are components that manage their own memory outside
of
+ Spark's core memory management, such as RocksDB for Streaming State Store.
+ Setting this to 0 disables unmanaged memory polling.
+ </td>
+ <td>4.1.0</td>
+</tr>
<tr>
<td><code>spark.storage.unrollMemoryThreshold</code></td>
<td>1024 * 1024</td>
@@ -2543,6 +2642,16 @@ Apart from these, the following properties are also
available, and may be useful
</td>
<td>0.7.0</td>
</tr>
+<tr>
+ <td><code>spark.driver.metrics.pollingInterval</code></td>
+ <td>10s</td>
+ <td>
+ How often to collect driver metrics (in milliseconds).
+ If unset, the polling is done at the executor heartbeat interval.
+ If set, the polling is done at this interval.
+ </td>
+ <td>4.1.0</td>
+</tr>
<tr>
<td><code>spark.rpc.io.backLog</code></td>
<td>64</td>
diff --git a/docs/monitoring.md b/docs/monitoring.md
index 49d04b328f29..e75f83110d19 100644
--- a/docs/monitoring.md
+++ b/docs/monitoring.md
@@ -401,6 +401,14 @@ Security options for the Spark History Server are covered
more detail in the
</td>
<td>3.0.0</td>
</tr>
+ <tr>
+ <td>spark.history.fs.eventLog.rolling.onDemandLoadEnabled</td>
+ <td>true</td>
+ <td>
+ Whether to look up rolling event log locations on demand manner before
listing files.
+ </td>
+ <td>4.1.0</td>
+ </tr>
<tr>
<td>spark.history.store.hybridStore.enabled</td>
<td>false</td>
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]