This is an automated email from the ASF dual-hosted git repository.
dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push:
new 25759a0de6d [SPARK-39743][DOCS] Improve
`spark.io.compression.(lz4|snappy|zstd)` conf scope description
25759a0de6d is described below
commit 25759a0de6dd09ecc440d009fba6d661558e7261
Author: zzzzming95 <[email protected]>
AuthorDate: Wed Aug 10 12:10:54 2022 -0700
[SPARK-39743][DOCS] Improve `spark.io.compression.(lz4|snappy|zstd)` conf
scope description
### What changes were proposed in this pull request?
Updated some `spark.io.compression` configuration descriptions to clarify
parameter application scope.
### Why are the changes needed?
Users are easily confused and mistakenly think that these parameters can be
applied to sparksql.
For example, the user wants to write a parquet file and set some
compression configurations such as zstd level (as the following code). In the
original document, user found that there is a `spark.io.compression.zstd.level`
in the relevant configuration, which seems to be the desired configuration, but
after using it, it is found that it does not take effect, which will make users
confused.
```
sparkSession.sessionState.newHadoopConfWithOptions(
"spark.io.compression.zstd.level" -> "10"
)
df.coalesce(1).write.parquet("file:///home/test_data/nn_parq_10")
```
### Does this PR introduce _any_ user-facing change?
no
### How was this patch tested?
Closes #37416 from zzzzming95/SPARK-39743-doc.
Authored-by: zzzzming95 <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
---
docs/configuration.md | 12 ++++++++----
1 file changed, 8 insertions(+), 4 deletions(-)
diff --git a/docs/configuration.md b/docs/configuration.md
index 957c430c37b..55e595ad301 100644
--- a/docs/configuration.md
+++ b/docs/configuration.md
@@ -1530,7 +1530,8 @@ Apart from these, the following properties are also
available, and may be useful
<td>
Block size used in LZ4 compression, in the case when LZ4 compression codec
is used. Lowering this block size will also lower shuffle memory usage
when LZ4 is used.
- Default unit is bytes, unless otherwise specified.
+ Default unit is bytes, unless otherwise specified. This configuration only
applies to
+ `spark.io.compression.codec`.
</td>
<td>1.4.0</td>
</tr>
@@ -1540,7 +1541,8 @@ Apart from these, the following properties are also
available, and may be useful
<td>
Block size in Snappy compression, in the case when Snappy compression
codec is used.
Lowering this block size will also lower shuffle memory usage when Snappy
is used.
- Default unit is bytes, unless otherwise specified.
+ Default unit is bytes, unless otherwise specified. This configuration only
applies
+ to `spark.io.compression.codec`.
</td>
<td>1.4.0</td>
</tr>
@@ -1549,7 +1551,8 @@ Apart from these, the following properties are also
available, and may be useful
<td>1</td>
<td>
Compression level for Zstd compression codec. Increasing the compression
level will result in better
- compression at the expense of more CPU and memory.
+ compression at the expense of more CPU and memory. This configuration only
applies to
+ `spark.io.compression.codec`.
</td>
<td>2.3.0</td>
</tr>
@@ -1559,7 +1562,8 @@ Apart from these, the following properties are also
available, and may be useful
<td>
Buffer size in bytes used in Zstd compression, in the case when Zstd
compression codec
is used. Lowering this size will lower the shuffle memory usage when Zstd
is used, but it
- might increase the compression cost because of excessive JNI call overhead.
+ might increase the compression cost because of excessive JNI call
overhead. This
+ configuration only applies to `spark.io.compression.codec`.
</td>
<td>2.3.0</td>
</tr>
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]