[spark] branch master updated: [SPARK-39743][DOCS] Improve `spark.io.compression.(lz4|snappy|zstd)` conf scope description

dongjoon Wed, 10 Aug 2022 12:11:42 -0700

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/master by this push:
     new 25759a0de6d [SPARK-39743][DOCS] Improve 
`spark.io.compression.(lz4|snappy|zstd)` conf scope description
25759a0de6d is described below

commit 25759a0de6dd09ecc440d009fba6d661558e7261
Author: zzzzming95 <[email protected]>
AuthorDate: Wed Aug 10 12:10:54 2022 -0700

    [SPARK-39743][DOCS] Improve `spark.io.compression.(lz4|snappy|zstd)` conf 
scope description
    
    ### What changes were proposed in this pull request?
    
    Updated some `spark.io.compression` configuration descriptions to clarify 
parameter application scope.
    
    ### Why are the changes needed?
    
    Users are easily confused and mistakenly think that these parameters can be 
applied to sparksql.
    
    For example, the user wants to write a parquet file and set some 
compression configurations such as zstd level (as the following code). In the 
original document, user found that there is a `spark.io.compression.zstd.level` 
in the relevant configuration, which seems to be the desired configuration, but 
after using it, it is found that it does not take effect, which will make users 
confused.
    
    ```
        sparkSession.sessionState.newHadoopConfWithOptions(
          "spark.io.compression.zstd.level" -> "10"
        )
    
        df.coalesce(1).write.parquet("file:///home/test_data/nn_parq_10")
    ```
    
    ### Does this PR introduce _any_ user-facing change?
    
    no
    
    ### How was this patch tested?
    
    Closes #37416 from zzzzming95/SPARK-39743-doc.
    
    Authored-by: zzzzming95 <[email protected]>
    Signed-off-by: Dongjoon Hyun <[email protected]>
---
 docs/configuration.md | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/docs/configuration.md b/docs/configuration.md
index 957c430c37b..55e595ad301 100644
--- a/docs/configuration.md
+++ b/docs/configuration.md
@@ -1530,7 +1530,8 @@ Apart from these, the following properties are also 
available, and may be useful
   <td>
     Block size used in LZ4 compression, in the case when LZ4 compression codec
     is used. Lowering this block size will also lower shuffle memory usage 
when LZ4 is used.
-    Default unit is bytes, unless otherwise specified.
+    Default unit is bytes, unless otherwise specified. This configuration only 
applies to
+    `spark.io.compression.codec`.
   </td>
   <td>1.4.0</td>
 </tr>
@@ -1540,7 +1541,8 @@ Apart from these, the following properties are also 
available, and may be useful
   <td>
     Block size in Snappy compression, in the case when Snappy compression 
codec is used. 
     Lowering this block size will also lower shuffle memory usage when Snappy 
is used.
-    Default unit is bytes, unless otherwise specified.
+    Default unit is bytes, unless otherwise specified. This configuration only 
applies
+    to `spark.io.compression.codec`.
   </td>
   <td>1.4.0</td>
 </tr>
@@ -1549,7 +1551,8 @@ Apart from these, the following properties are also 
available, and may be useful
   <td>1</td>
   <td>
     Compression level for Zstd compression codec. Increasing the compression 
level will result in better
-    compression at the expense of more CPU and memory.
+    compression at the expense of more CPU and memory. This configuration only 
applies to 
+    `spark.io.compression.codec`.
   </td>
   <td>2.3.0</td>
 </tr>
@@ -1559,7 +1562,8 @@ Apart from these, the following properties are also 
available, and may be useful
   <td>
     Buffer size in bytes used in Zstd compression, in the case when Zstd 
compression codec
     is used. Lowering this size will lower the shuffle memory usage when Zstd 
is used, but it
-    might increase the compression cost because of excessive JNI call overhead.
+    might increase the compression cost because of excessive JNI call 
overhead. This
+    configuration only applies to `spark.io.compression.codec`.
   </td>
   <td>2.3.0</td>
 </tr>


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[spark] branch master updated: [SPARK-39743][DOCS] Improve `spark.io.compression.(lz4|snappy|zstd)` conf scope description

Reply via email to