This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git
commit dec53bedbd4acbde0b7a6064476dbd148e4c6d08 Author: beliefer <[email protected]> AuthorDate: Tue Mar 31 12:33:46 2020 +0900 [SPARK-31295][DOC] Supplement version for configuration appear in doc ### What changes were proposed in this pull request? This PR supplements version for configuration appear in docs. I sorted out some information show below. **docs/spark-standalone.md** Item name | Since version | JIRA ID | Commit ID | Note -- | -- | -- | -- | -- spark.deploy.retainedApplications | 0.8.0 | None | 46eecd110a4017ea0c86cbb1010d0ccd6a5eb2ef#diff-29dffdccd5a7f4c8b496c293e87c8668 | spark.deploy.retainedDrivers | 1.1.0 | None | 7446f5ff93142d2dd5c79c63fa947f47a1d4db8b#diff-29dffdccd5a7f4c8b496c293e87c8668 | spark.deploy.spreadOut | 0.6.1 | None | bb2b9ff37cd2503cc6ea82c5dd395187b0910af0#diff-0e7ae91819fc8f7b47b0f97be7116325 | spark.deploy.defaultCores | 0.9.0 | None | d8bcc8e9a095c1b20dd7a17b6535800d39bff80e#diff-29dffdccd5a7f4c8b496c293e87c8668 | spark.deploy.maxExecutorRetries | 1.6.3 | SPARK-16956 | ace458f0330f22463ecf7cbee7c0465e10fba8a8#diff-29dffdccd5a7f4c8b496c293e87c8668 | spark.worker.resource.{resourceName}.amount | 3.0.0 | SPARK-27371 | cbad616d4cb0c58993a88df14b5e30778c7f7e85#diff-d25032e4a3ae1b85a59e4ca9ccf189a8 | spark.worker.resource.{resourceName}.discoveryScript | 3.0.0 | SPARK-27371 | cbad616d4cb0c58993a88df14b5e30778c7f7e85#diff-d25032e4a3ae1b85a59e4ca9ccf189a8 | spark.worker.resourcesFile | 3.0.0 | SPARK-27369 | 7cbe01e8efc3f6cd3a0cac4bcfadea8fcc74a955#diff-b2fc8d6ab7ac5735085e2d6cfacb95da | spark.shuffle.service.db.enabled | 3.0.0 | SPARK-26288 | 8b0aa59218c209d39cbba5959302d8668b885cf6#diff-6bdad48cfc34314e89599655442ff210 | spark.storage.cleanupFilesAfterExecutorExit | 2.4.0 | SPARK-24340 | 8ef167a5f9ba8a79bb7ca98a9844fe9cfcfea060#diff-916ca56b663f178f302c265b7ef38499 | spark.deploy.recoveryMode | 0.8.1 | None | d66c01f2b6defb3db6c1be99523b734a4d960532#diff-29dffdccd5a7f4c8b496c293e87c8668 | spark.deploy.recoveryDirectory | 0.8.1 | None | d66c01f2b6defb3db6c1be99523b734a4d960532#diff-29dffdccd5a7f4c8b496c293e87c8668 | **docs/sql-data-sources-avro.md** Item name | Since version | JIRA ID | Commit ID | Note -- | -- | -- | -- | -- spark.sql.legacy.replaceDatabricksSparkAvro.enabled | 2.4.0 | SPARK-25129 | ac0174e55af2e935d41545721e9f430c942b3a0c#diff-9a6b543db706f1a90f790783d6930a13 | spark.sql.avro.compression.codec | 2.4.0 | SPARK-24881 | 0a0f68bae6c0a1bf30184b1e9ac6bf3805bd7511#diff-9a6b543db706f1a90f790783d6930a13 | spark.sql.avro.deflate.level | 2.4.0 | SPARK-24881 | 0a0f68bae6c0a1bf30184b1e9ac6bf3805bd7511#diff-9a6b543db706f1a90f790783d6930a13 | **docs/sql-data-sources-orc.md** Item name | Since version | JIRA ID | Commit ID | Note -- | -- | -- | -- | -- spark.sql.orc.impl | 2.3.0 | SPARK-20728 | 326f1d6728a7734c228d8bfaa69442a1c7b92e9b#diff-9a6b543db706f1a90f790783d6930a13 | spark.sql.orc.enableVectorizedReader | 2.3.0 | SPARK-16060 | 60f6b994505e3f82091a04eed2dc0a9e8bd523ce#diff-9a6b543db706f1a90f790783d6930a13 | **docs/sql-data-sources-parquet.md** Item name | Since version | JIRA ID | Commit ID | Note -- | -- | -- | -- | -- spark.sql.parquet.binaryAsString | 1.1.1 | SPARK-2927 | de501e169f24e4573747aec85b7651c98633c028#diff-41ef65b9ef5b518f77e2a03559893f4d | spark.sql.parquet.int96AsTimestamp | 1.3.0 | SPARK-4987 | 67d52207b5cf2df37ca70daff2a160117510f55e#diff-41ef65b9ef5b518f77e2a03559893f4d | spark.sql.parquet.compression.codec | 1.1.1 | SPARK-3131 | 3a9d874d7a46ab8b015631d91ba479d9a0ba827f#diff-41ef65b9ef5b518f77e2a03559893f4d | spark.sql.parquet.filterPushdown | 1.2.0 | SPARK-4391 | 576688aa2a19bd4ba239a2b93af7947f983e5124#diff-41ef65b9ef5b518f77e2a03559893f4d | spark.sql.hive.convertMetastoreParquet | 1.1.1 | SPARK-2406 | cc4015d2fa3785b92e6ab079b3abcf17627f7c56#diff-ff50aea397a607b79df9bec6f2a841db | spark.sql.parquet.mergeSchema | 1.5.0 | SPARK-8690 | 246265f2bb056d5e9011d3331b809471a24ff8d7#diff-41ef65b9ef5b518f77e2a03559893f4d | spark.sql.parquet.writeLegacyFormat | 1.6.0 | SPARK-10400 | 01cd688f5245cbb752863100b399b525b31c3510#diff-41ef65b9ef5b518f77e2a03559893f4d | ### Why are the changes needed? Supplemental configuration version information. ### Does this PR introduce any user-facing change? 'No'. ### How was this patch tested? Jenkins test Closes #28064 from beliefer/supplement-doc-for-data-sources. Authored-by: beliefer <[email protected]> Signed-off-by: HyukjinKwon <[email protected]> --- docs/spark-standalone.md | 13 ++++++++++++- docs/sql-data-sources-avro.md | 21 +++++++++++++++++---- docs/sql-data-sources-orc.md | 16 +++++++++++++--- docs/sql-data-sources-parquet.md | 9 ++++++++- 4 files changed, 50 insertions(+), 9 deletions(-) diff --git a/docs/spark-standalone.md b/docs/spark-standalone.md index 4d4b85e..2c2ed53 100644 --- a/docs/spark-standalone.md +++ b/docs/spark-standalone.md @@ -192,6 +192,7 @@ SPARK_MASTER_OPTS supports the following system properties: <td> The maximum number of completed applications to display. Older applications will be dropped from the UI to maintain this limit.<br/> </td> + <td>0.8.0</td> </tr> <tr> <td><code>spark.deploy.retainedDrivers</code></td> @@ -199,6 +200,7 @@ SPARK_MASTER_OPTS supports the following system properties: <td> The maximum number of completed drivers to display. Older drivers will be dropped from the UI to maintain this limit.<br/> </td> + <td>1.1.0</td> </tr> <tr> <td><code>spark.deploy.spreadOut</code></td> @@ -208,6 +210,7 @@ SPARK_MASTER_OPTS supports the following system properties: to consolidate them onto as few nodes as possible. Spreading out is usually better for data locality in HDFS, but consolidating is more efficient for compute-intensive workloads. <br/> </td> + <td>0.6.1</td> </tr> <tr> <td><code>spark.deploy.defaultCores</code></td> @@ -219,6 +222,7 @@ SPARK_MASTER_OPTS supports the following system properties: Set this lower on a shared cluster to prevent users from grabbing the whole cluster by default. <br/> </td> + <td>0.9.0</td> </tr> <tr> <td><code>spark.deploy.maxExecutorRetries</code></td> @@ -234,6 +238,7 @@ SPARK_MASTER_OPTS supports the following system properties: <code>-1</code>. <br/> </td> + <td>1.6.3</td> </tr> <tr> <td><code>spark.worker.timeout</code></td> @@ -250,6 +255,7 @@ SPARK_MASTER_OPTS supports the following system properties: <td> Amount of a particular resource to use on the worker. </td> + <td>3.0.0</td> </tr> <tr> <td><code>spark.worker.resource.{resourceName}.discoveryScript</code></td> @@ -258,6 +264,7 @@ SPARK_MASTER_OPTS supports the following system properties: Path to resource discovery script, which is used to find a particular resource while worker starting up. And the output of the script should be formatted like the <code>ResourceInformation</code> class. </td> + <td>3.0.0</td> </tr> <tr> <td><code>spark.worker.resourcesFile</code></td> @@ -317,6 +324,7 @@ SPARK_WORKER_OPTS supports the following system properties: enabled). You should also enable <code>spark.worker.cleanup.enabled</code>, to ensure that the state eventually gets cleaned up. This config may be removed in the future. </td> + <td>3.0.0</td> </tr> <tr> <td><code>spark.storage.cleanupFilesAfterExecutorExit</code></td> @@ -329,6 +337,7 @@ SPARK_WORKER_OPTS supports the following system properties: all files/subdirectories of a stopped and timeout application. This only affects Standalone mode, support of other cluster manangers can be added in the future. </td> + <td>2.4.0</td> </tr> <tr> <td><code>spark.worker.ui.compressedLogFileLengthCacheSize</code></td> @@ -490,14 +499,16 @@ ZooKeeper is the best way to go for production-level high availability, but if y In order to enable this recovery mode, you can set SPARK_DAEMON_JAVA_OPTS in spark-env using this configuration: <table class="table"> - <tr><th style="width:21%">System property</th><th>Meaning</th></tr> + <tr><th style="width:21%">System property</th><th>Meaning</th><th>Since Version</th></tr> <tr> <td><code>spark.deploy.recoveryMode</code></td> <td>Set to FILESYSTEM to enable single-node recovery mode (default: NONE).</td> + <td>0.8.1</td> </tr> <tr> <td><code>spark.deploy.recoveryDirectory</code></td> <td>The directory in which Spark will store recovery state, accessible from the Master's perspective.</td> + <td>0.8.1</td> </tr> </table> diff --git a/docs/sql-data-sources-avro.md b/docs/sql-data-sources-avro.md index 8e6a407..d926ae7 100644 --- a/docs/sql-data-sources-avro.md +++ b/docs/sql-data-sources-avro.md @@ -258,21 +258,34 @@ Data source options of Avro can be set via: ## Configuration Configuration of Avro can be done using the `setConf` method on SparkSession or by running `SET key=value` commands using SQL. <table class="table"> - <tr><th><b>Property Name</b></th><th><b>Default</b></th><th><b>Meaning</b></th></tr> + <tr><th><b>Property Name</b></th><th><b>Default</b></th><th><b>Meaning</b></th><th><b>Since Version</b></th></tr> <tr> <td>spark.sql.legacy.replaceDatabricksSparkAvro.enabled</td> <td>true</td> - <td>If it is set to true, the data source provider <code>com.databricks.spark.avro</code> is mapped to the built-in but external Avro data source module for backward compatibility.</td> + <td> + If it is set to true, the data source provider <code>com.databricks.spark.avro</code> is mapped + to the built-in but external Avro data source module for backward compatibility. + </td> + <td>2.4.0</td> </tr> <tr> <td>spark.sql.avro.compression.codec</td> <td>snappy</td> - <td>Compression codec used in writing of AVRO files. Supported codecs: uncompressed, deflate, snappy, bzip2 and xz. Default codec is snappy.</td> + <td> + Compression codec used in writing of AVRO files. Supported codecs: uncompressed, deflate, + snappy, bzip2 and xz. Default codec is snappy. + </td> + <td>2.4.0</td> </tr> <tr> <td>spark.sql.avro.deflate.level</td> <td>-1</td> - <td>Compression level for the deflate codec used in writing of AVRO files. Valid value must be in the range of from 1 to 9 inclusive or -1. The default value is -1 which corresponds to 6 level in the current implementation.</td> + <td> + Compression level for the deflate codec used in writing of AVRO files. Valid value must be in + the range of from 1 to 9 inclusive or -1. The default value is -1 which corresponds to 6 level + in the current implementation. + </td> + <td>2.4.0</td> </tr> </table> diff --git a/docs/sql-data-sources-orc.md b/docs/sql-data-sources-orc.md index bddffe0..4c4b3b1 100644 --- a/docs/sql-data-sources-orc.md +++ b/docs/sql-data-sources-orc.md @@ -27,15 +27,25 @@ serde tables (e.g., the ones created using the clause `USING HIVE OPTIONS (fileF the vectorized reader is used when `spark.sql.hive.convertMetastoreOrc` is also set to `true`. <table class="table"> - <tr><th><b>Property Name</b></th><th><b>Default</b></th><th><b>Meaning</b></th></tr> + <tr><th><b>Property Name</b></th><th><b>Default</b></th><th><b>Meaning</b></th><th><b>Since Version</b></th></tr> <tr> <td><code>spark.sql.orc.impl</code></td> <td><code>native</code></td> - <td>The name of ORC implementation. It can be one of <code>native</code> and <code>hive</code>. <code>native</code> means the native ORC support. <code>hive</code> means the ORC library in Hive.</td> + <td> + The name of ORC implementation. It can be one of <code>native</code> and <code>hive</code>. + <code>native</code> means the native ORC support. <code>hive</code> means the ORC library + in Hive. + </td> + <td>2.3.0</td> </tr> <tr> <td><code>spark.sql.orc.enableVectorizedReader</code></td> <td><code>true</code></td> - <td>Enables vectorized orc decoding in <code>native</code> implementation. If <code>false</code>, a new non-vectorized ORC reader is used in <code>native</code> implementation. For <code>hive</code> implementation, this is ignored.</td> + <td> + Enables vectorized orc decoding in <code>native</code> implementation. If <code>false</code>, + a new non-vectorized ORC reader is used in <code>native</code> implementation. + For <code>hive</code> implementation, this is ignored. + </td> + <td>2.3.0</td> </tr> </table> diff --git a/docs/sql-data-sources-parquet.md b/docs/sql-data-sources-parquet.md index 53a1111..6e52446 100644 --- a/docs/sql-data-sources-parquet.md +++ b/docs/sql-data-sources-parquet.md @@ -258,7 +258,7 @@ Configuration of Parquet can be done using the `setConf` method on `SparkSession `SET key=value` commands using SQL. <table class="table"> -<tr><th>Property Name</th><th>Default</th><th>Meaning</th></tr> +<tr><th>Property Name</th><th>Default</th><th>Meaning</th><th>Since Version</th></tr> <tr> <td><code>spark.sql.parquet.binaryAsString</code></td> <td>false</td> @@ -267,6 +267,7 @@ Configuration of Parquet can be done using the `setConf` method on `SparkSession not differentiate between binary data and strings when writing out the Parquet schema. This flag tells Spark SQL to interpret binary data as a string to provide compatibility with these systems. </td> + <td>1.1.1</td> </tr> <tr> <td><code>spark.sql.parquet.int96AsTimestamp</code></td> @@ -275,6 +276,7 @@ Configuration of Parquet can be done using the `setConf` method on `SparkSession Some Parquet-producing systems, in particular Impala and Hive, store Timestamp into INT96. This flag tells Spark SQL to interpret INT96 data as a timestamp to provide compatibility with these systems. </td> + <td>1.3.0</td> </tr> <tr> <td><code>spark.sql.parquet.compression.codec</code></td> @@ -287,11 +289,13 @@ Configuration of Parquet can be done using the `setConf` method on `SparkSession Note that <code>zstd</code> requires <code>ZStandardCodec</code> to be installed before Hadoop 2.9.0, <code>brotli</code> requires <code>BrotliCodec</code> to be installed. </td> + <td>1.1.1</td> </tr> <tr> <td><code>spark.sql.parquet.filterPushdown</code></td> <td>true</td> <td>Enables Parquet filter push-down optimization when set to true.</td> + <td>1.2.0</td> </tr> <tr> <td><code>spark.sql.hive.convertMetastoreParquet</code></td> @@ -300,6 +304,7 @@ Configuration of Parquet can be done using the `setConf` method on `SparkSession When set to false, Spark SQL will use the Hive SerDe for parquet tables instead of the built in support. </td> + <td>1.1.1</td> </tr> <tr> <td><code>spark.sql.parquet.mergeSchema</code></td> @@ -310,6 +315,7 @@ Configuration of Parquet can be done using the `setConf` method on `SparkSession schema is picked from the summary file or a random data file if no summary file is available. </p> </td> + <td>1.5.0</td> </tr> <tr> <td><code>spark.sql.parquet.writeLegacyFormat</code></td> @@ -321,5 +327,6 @@ Configuration of Parquet can be done using the `setConf` method on `SparkSession example, decimals will be written in int-based format. If Parquet output is intended for use with systems that do not support this newer format, set to true. </td> + <td>1.6.0</td> </tr> </table> --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
