This is an automated email from the ASF dual-hosted git repository.
dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push:
new e6a76dfc0f00 [SPARK-53896][CORE] Enable
`spark.io.compression.lzf.parallel.enabled` by default
e6a76dfc0f00 is described below
commit e6a76dfc0f00cb2be3e5a50a15682c9f2a863067
Author: Dongjoon Hyun <[email protected]>
AuthorDate: Mon Oct 13 23:19:37 2025 -0700
[SPARK-53896][CORE] Enable `spark.io.compression.lzf.parallel.enabled` by
default
### What changes were proposed in this pull request?
This PR aims to enable `spark.io.compression.lzf.parallel.enabled` by
default at Apache Spark 4.1.0.
### Why are the changes needed?
`spark.io.compression.lzf.parallel.enabled` was introduced at Apache Spark
4.0.0 and has been used stably so far. We can enable this by default.
- https://github.com/apache/spark/pull/46858
### Does this PR introduce _any_ user-facing change?
Yes for `LZF` users. The migration guide is updated.
### How was this patch tested?
Pass the CIs.
### Was this patch authored or co-authored using generative AI tooling?
No.
Closes #52603 from dongjoon-hyun/SPARK-53896.
Authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
---
core/src/main/scala/org/apache/spark/internal/config/package.scala | 2 +-
docs/configuration.md | 2 +-
docs/core-migration-guide.md | 1 +
3 files changed, 3 insertions(+), 2 deletions(-)
diff --git a/core/src/main/scala/org/apache/spark/internal/config/package.scala
b/core/src/main/scala/org/apache/spark/internal/config/package.scala
index d413d06ffc94..94fe31e1cd8c 100644
--- a/core/src/main/scala/org/apache/spark/internal/config/package.scala
+++ b/core/src/main/scala/org/apache/spark/internal/config/package.scala
@@ -2137,7 +2137,7 @@ package object config {
.doc("When true, LZF compression will use multiple threads to compress
data in parallel.")
.version("4.0.0")
.booleanConf
- .createWithDefault(false)
+ .createWithDefault(true)
private[spark] val IO_WARNING_LARGEFILETHRESHOLD =
ConfigBuilder("spark.io.warning.largeFileThreshold")
diff --git a/docs/configuration.md b/docs/configuration.md
index 573b485f7e2d..b999a6ee2577 100644
--- a/docs/configuration.md
+++ b/docs/configuration.md
@@ -1918,7 +1918,7 @@ Apart from these, the following properties are also
available, and may be useful
</tr>
<tr>
<td><code>spark.io.compression.lzf.parallel.enabled</code></td>
- <td>false</td>
+ <td>true</td>
<td>
When true, LZF compression will use multiple threads to compress data in
parallel.
</td>
diff --git a/docs/core-migration-guide.md b/docs/core-migration-guide.md
index a738363ace1d..19b77624d626 100644
--- a/docs/core-migration-guide.md
+++ b/docs/core-migration-guide.md
@@ -29,6 +29,7 @@ license: |
- Since Spark 4.1, Spark uses Apache Hadoop Magic Committer for all S3 buckets
by default. To restore the behavior before Spark 4.0, you can set
`spark.hadoop.fs.s3a.committer.magic.enabled=false`.
- Since Spark 4.1, `java.lang.InternalError` encountered during file reading
will no longer fail the task if the configuration
`spark.sql.files.ignoreCorruptFiles` or the data source option
`ignoreCorruptFiles` is set to `true`.
- Since Spark 4.1, Spark ignores `*.blacklist.*` alternative configuration
names. To restore the behavior before Spark 4.1, you can use the corresponding
configuration names instead which exists since Spark 3.1.0.
+- Since Spark 4.1, Spark will use multiple threads for LZF compression to
compress data in parallel. To restore the behavior before Spark 4.1, you can
set `spark.io.compression.lzf.parallel.enabled` to `false`.
## Upgrading from Core 3.5 to 4.0
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]