[spark] 02/04: [SPARK-31295][DOC][FOLLOWUP] Supplement version for configuration appear in doc

gurwls223 Tue, 07 Apr 2020 05:58:26 -0700

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


commit 8af58ebd958813e9ff29d2f0d1b070d529ba1275
Author: beliefer <[email protected]>
AuthorDate: Thu Apr 2 16:01:54 2020 +0900

    [SPARK-31295][DOC][FOLLOWUP] Supplement version for configuration appear in 
doc
    
    ### What changes were proposed in this pull request?
    This PR supplements version for configuration appear in docs.
    I sorted out some information show below.
    
    **docs/sql-performance-tuning.md**
    Item name | Since version | JIRA ID | Commit ID | Note
    -- | -- | -- | -- | --
    spark.sql.inMemoryColumnarStorage.compressed | 1.0.1 | SPARK-2631 | 
86534d0f5255362618c05a07b0171ec35c915822#diff-41ef65b9ef5b518f77e2a03559893f4d 
|  
    spark.sql.inMemoryColumnarStorage.batchSize | 1.1.1 | SPARK-2650 | 
779d1eb26d0f031791e93c908d51a59c3b422a55#diff-41ef65b9ef5b518f77e2a03559893f4d 
|  
    spark.sql.files.maxPartitionBytes | 2.0.0 | SPARK-13664 | 
17eec0a71ba8713c559d641e3f43a1be726b037c#diff-32bb9518401c0948c5ea19377b5069ab 
|  
    spark.sql.files.openCostInBytes | 2.0.0 | SPARK-14259 | 
400b2f863ffaa01a34a8dae1541c61526fef908b#diff-32bb9518401c0948c5ea19377b5069ab 
|  
    spark.sql.broadcastTimeout | 1.3.0 | SPARK-4269 | 
fa66ef6c97e87c9255b67b03836a4ba50598ebae#diff-41ef65b9ef5b518f77e2a03559893f4d 
|  
    spark.sql.autoBroadcastJoinThreshold | 1.1.0 | SPARK-2393 | 
c7db274be79f448fda566208946cb50958ea9b1a#diff-41ef65b9ef5b518f77e2a03559893f4d 
|  
    spark.sql.shuffle.partitions | 1.1.0 | SPARK-1508 | 
08ed9ad81397b71206c4dc903bfb94b6105691ed#diff-41ef65b9ef5b518f77e2a03559893f4d 
|  
    spark.sql.adaptive.coalescePartitions.enabled | 3.0.0 | SPARK-31037 | 
46b7f1796bd0b96977ce9b473601033f397a3b18#diff-9a6b543db706f1a90f790783d6930a13 
|  
    spark.sql.adaptive.coalescePartitions.minPartitionNum | 3.0.0 | SPARK-31037 
| 
46b7f1796bd0b96977ce9b473601033f397a3b18#diff-9a6b543db706f1a90f790783d6930a13 
|  
    spark.sql.adaptive.coalescePartitions.initialPartitionNum | 3.0.0 | 
SPARK-31037 | 
46b7f1796bd0b96977ce9b473601033f397a3b18#diff-9a6b543db706f1a90f790783d6930a13 
|  
    spark.sql.adaptive.advisoryPartitionSizeInBytes | 3.0.0 | SPARK-31037 | 
46b7f1796bd0b96977ce9b473601033f397a3b18#diff-9a6b543db706f1a90f790783d6930a13 
|  
    spark.sql.adaptive.skewJoin.enabled | 3.0.0 | SPARK-31037 | 
46b7f1796bd0b96977ce9b473601033f397a3b18#diff-9a6b543db706f1a90f790783d6930a13 
|  
    spark.sql.adaptive.skewJoin.skewedPartitionFactor | 3.0.0 | SPARK-31037 | 
46b7f1796bd0b96977ce9b473601033f397a3b18#diff-9a6b543db706f1a90f790783d6930a13 
|  
    spark.sql.adaptive.skewJoin.skewedPartitionThresholdInBytes | 3.0.0 | 
SPARK-31201 | 
8d0800a0803d3c47938bddefa15328d654739bc5#diff-9a6b543db706f1a90f790783d6930a13 
|  
    
    **docs/sql-ref-ansi-compliance.md**
    Item name | Since version | JIRA ID | Commit ID | Note
    -- | -- | -- | -- | --
    spark.sql.ansi.enabled | 3.0.0 | SPARK-30125 | 
d9b30694122f8716d3acb448638ef1e2b96ebc7a#diff-9a6b543db706f1a90f790783d6930a13 
|  
    spark.sql.storeAssignmentPolicy | 3.0.0 | SPARK-28730 | 
895c90b582cc2b2667241f66d5b733852aeef9eb#diff-9a6b543db706f1a90f790783d6930a13 |
    
    ### Why are the changes needed?
    Supplemental configuration version information.
    
    ### Does this PR introduce any user-facing change?
    'No'.
    
    ### How was this patch tested?
    Jenkins test
    
    Closes #28096 from beliefer/supplement-version-of-performance.
    
    Authored-by: beliefer <[email protected]>
    Signed-off-by: HyukjinKwon <[email protected]>
---
 docs/sql-performance-tuning.md  | 30 ++++++++++++++++++++++--------
 docs/sql-ref-ansi-compliance.md |  4 +++-
 2 files changed, 25 insertions(+), 9 deletions(-)

diff --git a/docs/sql-performance-tuning.md b/docs/sql-performance-tuning.md
index 9a1cc89..279aad6 100644
--- a/docs/sql-performance-tuning.md
+++ b/docs/sql-performance-tuning.md
@@ -35,7 +35,7 @@ Configuration of in-memory caching can be done using the 
`setConf` method on `Sp
 `SET key=value` commands using SQL.
 
 <table class="table">
-<tr><th>Property Name</th><th>Default</th><th>Meaning</th></tr>
+<tr><th>Property Name</th><th>Default</th><th>Meaning</th><th>Since 
Version</th></tr>
 <tr>
   <td><code>spark.sql.inMemoryColumnarStorage.compressed</code></td>
   <td>true</td>
@@ -43,6 +43,7 @@ Configuration of in-memory caching can be done using the 
`setConf` method on `Sp
     When set to true Spark SQL will automatically select a compression codec 
for each column based
     on statistics of the data.
   </td>
+  <td>1.0.1</td>
 </tr>
 <tr>
   <td><code>spark.sql.inMemoryColumnarStorage.batchSize</code></td>
@@ -51,6 +52,7 @@ Configuration of in-memory caching can be done using the 
`setConf` method on `Sp
     Controls the size of batches for columnar caching. Larger batch sizes can 
improve memory utilization
     and compression, but risk OOMs when caching data.
   </td>
+  <td>1.1.1</td>
 </tr>
 
 </table>
@@ -61,7 +63,7 @@ The following options can also be used to tune the 
performance of query executio
 that these options will be deprecated in future release as more optimizations 
are performed automatically.
 
 <table class="table">
-  <tr><th>Property Name</th><th>Default</th><th>Meaning</th></tr>
+  <tr><th>Property Name</th><th>Default</th><th>Meaning</th><th>Since 
Version</th></tr>
   <tr>
     <td><code>spark.sql.files.maxPartitionBytes</code></td>
     <td>134217728 (128 MB)</td>
@@ -69,6 +71,7 @@ that these options will be deprecated in future release as 
more optimizations ar
       The maximum number of bytes to pack into a single partition when reading 
files.
       This configuration is effective only when using file-based sources such 
as Parquet, JSON and ORC.
     </td>
+    <td>2.0.0</td>
   </tr>
   <tr>
     <td><code>spark.sql.files.openCostInBytes</code></td>
@@ -80,15 +83,17 @@ that these options will be deprecated in future release as 
more optimizations ar
       scheduled first). This configuration is effective only when using 
file-based sources such as Parquet,
       JSON and ORC.
     </td>
+    <td>2.0.0</td>
   </tr>
   <tr>
     <td><code>spark.sql.broadcastTimeout</code></td>
     <td>300</td>
     <td>
-    <p>
-      Timeout in seconds for the broadcast wait time in broadcast joins
-    </p>
+      <p>
+        Timeout in seconds for the broadcast wait time in broadcast joins
+      </p>
     </td>
+    <td>1.3.0</td>
   </tr>
   <tr>
     <td><code>spark.sql.autoBroadcastJoinThreshold</code></td>
@@ -99,6 +104,7 @@ that these options will be deprecated in future release as 
more optimizations ar
       statistics are only supported for Hive Metastore tables where the command
       <code>ANALYZE TABLE &lt;tableName&gt; COMPUTE STATISTICS noscan</code> 
has been run.
     </td>
+    <td>1.1.0</td>
   </tr>
   <tr>
     <td><code>spark.sql.shuffle.partitions</code></td>
@@ -106,6 +112,7 @@ that these options will be deprecated in future release as 
more optimizations ar
     <td>
       Configures the number of partitions to use when shuffling data for joins 
or aggregations.
     </td>
+    <td>1.1.0</td>
   </tr>
 </table>
 
@@ -193,13 +200,14 @@ Adaptive Query Execution (AQE) is an optimization 
technique in Spark SQL that ma
 ### Coalescing Post Shuffle Partitions
 This feature coalesces the post shuffle partitions based on the map output 
statistics when both `spark.sql.adaptive.enabled` and 
`spark.sql.adaptive.coalescePartitions.enabled` configurations are true. This 
feature simplifies the tuning of shuffle partition number when running queries. 
You do not need to set a proper shuffle partition number to fit your dataset. 
Spark can pick the proper shuffle partition number at runtime once you set a 
large enough initial number of shuffle partitions  [...]
  <table class="table">
-   <tr><th>Property Name</th><th>Default</th><th>Meaning</th></tr>
+   <tr><th>Property Name</th><th>Default</th><th>Meaning</th><th>Since 
Version</th></tr>
    <tr>
      <td><code>spark.sql.adaptive.coalescePartitions.enabled</code></td>
      <td>true</td>
      <td>
        When true and <code>spark.sql.adaptive.enabled</code> is true, Spark 
will coalesce contiguous shuffle partitions according to the target size 
(specified by <code>spark.sql.adaptive.advisoryPartitionSizeInBytes</code>), to 
avoid too many small tasks.
      </td>
+     <td>3.0.0</td>
    </tr>
    <tr>
      
<td><code>spark.sql.adaptive.coalescePartitions.minPartitionNum</code></td>
@@ -207,6 +215,7 @@ This feature coalesces the post shuffle partitions based on 
the map output stati
      <td>
        The minimum number of shuffle partitions after coalescing. If not set, 
the default value is the default parallelism of the Spark cluster. This 
configuration only has an effect when <code>spark.sql.adaptive.enabled</code> 
and <code>spark.sql.adaptive.coalescePartitions.enabled</code> are both enabled.
      </td>
+     <td>3.0.0</td>
    </tr>
    <tr>
      
<td><code>spark.sql.adaptive.coalescePartitions.initialPartitionNum</code></td>
@@ -214,6 +223,7 @@ This feature coalesces the post shuffle partitions based on 
the map output stati
      <td>
        The initial number of shuffle partitions before coalescing. By default 
it equals to <code>spark.sql.shuffle.partitions</code>. This configuration only 
has an effect when <code>spark.sql.adaptive.enabled</code> and 
<code>spark.sql.adaptive.coalescePartitions.enabled</code> are both enabled.
      </td>
+     <td>3.0.0</td>
    </tr>
    <tr>
      <td><code>spark.sql.adaptive.advisoryPartitionSizeInBytes</code></td>
@@ -221,6 +231,7 @@ This feature coalesces the post shuffle partitions based on 
the map output stati
      <td>
        The advisory size in bytes of the shuffle partition during adaptive 
optimization (when <code>spark.sql.adaptive.enabled</code> is true). It takes 
effect when Spark coalesces small shuffle partitions or splits skewed shuffle 
partition.
      </td>
+     <td>3.0.0</td>
    </tr>
  </table>
  
@@ -230,13 +241,14 @@ AQE converts sort-merge join to broadcast hash join when 
the runtime statistics
 ### Optimizing Skew Join
 Data skew can severely downgrade the performance of join queries. This feature 
dynamically handles skew in sort-merge join by splitting (and replicating if 
needed) skewed tasks into roughly evenly sized tasks. It takes effect when both 
`spark.sql.adaptive.enabled` and `spark.sql.adaptive.skewJoin.enabled` 
configurations are enabled.
   <table class="table">
-     <tr><th>Property Name</th><th>Default</th><th>Meaning</th></tr>
+     <tr><th>Property Name</th><th>Default</th><th>Meaning</th><th>Since 
Version</th></tr>
      <tr>
        <td><code>spark.sql.adaptive.skewJoin.enabled</code></td>
        <td>true</td>
        <td>
          When true and <code>spark.sql.adaptive.enabled</code> is true, Spark 
dynamically handles skew in sort-merge join by splitting (and replicating if 
needed) skewed partitions.
        </td>
+       <td>3.0.0</td>
      </tr>
      <tr>
        <td><code>spark.sql.adaptive.skewJoin.skewedPartitionFactor</code></td>
@@ -244,6 +256,7 @@ Data skew can severely downgrade the performance of join 
queries. This feature d
        <td>
          A partition is considered as skewed if its size is larger than this 
factor multiplying the median partition size and also larger than 
<code>spark.sql.adaptive.skewedPartitionThresholdInBytes</code>.
        </td>
+       <td>3.0.0</td>
      </tr>
      <tr>
        
<td><code>spark.sql.adaptive.skewJoin.skewedPartitionThresholdInBytes</code></td>
@@ -251,5 +264,6 @@ Data skew can severely downgrade the performance of join 
queries. This feature d
        <td>
          A partition is considered as skewed if its size in bytes is larger 
than this threshold and also larger than 
<code>spark.sql.adaptive.skewJoin.skewedPartitionFactor</code> multiplying the 
median partition size. Ideally this config should be set larger than 
<code>spark.sql.adaptive.advisoryPartitionSizeInBytes</code>.
        </td>
+       <td>3.0.0</td>
      </tr>
-   </table>
\ No newline at end of file
+   </table>
diff --git a/docs/sql-ref-ansi-compliance.md b/docs/sql-ref-ansi-compliance.md
index bc5bde6..83affb9 100644
--- a/docs/sql-ref-ansi-compliance.md
+++ b/docs/sql-ref-ansi-compliance.md
@@ -28,7 +28,7 @@ The casting behaviours are defined as store assignment rules 
in the standard.
 When `spark.sql.storeAssignmentPolicy` is set to `ANSI`, Spark SQL complies 
with the ANSI store assignment rules. This is a separate configuration because 
its default value is `ANSI`, while the configuration `spark.sql.ansi.enabled` 
is disabled by default.
 
 <table class="table">
-<tr><th>Property Name</th><th>Default</th><th>Meaning</th></tr>
+<tr><th>Property Name</th><th>Default</th><th>Meaning</th><th>Since 
Version</th></tr>
 <tr>
   <td><code>spark.sql.ansi.enabled</code></td>
   <td>false</td>
@@ -37,6 +37,7 @@ When `spark.sql.storeAssignmentPolicy` is set to `ANSI`, 
Spark SQL complies with
     1. Spark will throw a runtime exception if an overflow occurs in any 
operation on integral/decimal field.
     2. Spark will forbid using the reserved keywords of ANSI SQL as 
identifiers in the SQL parser.
   </td>
+  <td>3.0.0</td>
 </tr>
 <tr>
   <td><code>spark.sql.storeAssignmentPolicy</code></td>
@@ -52,6 +53,7 @@ When `spark.sql.storeAssignmentPolicy` is set to `ANSI`, 
Spark SQL complies with
     With strict policy, Spark doesn't allow any possible precision loss or 
data truncation in type coercion,
     e.g. converting double to int or decimal to double is not allowed.
   </td>
+  <td>3.0.0</td>
 </tr>
 </table>
 


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[spark] 02/04: [SPARK-31295][DOC][FOLLOWUP] Supplement version for configuration appear in doc

Reply via email to