Repository: spark Updated Branches: refs/heads/master 58f6e27dd -> 41e0ffb19
[SPARK-15894][SQL][DOC] Update docs for controlling #partitions ## What changes were proposed in this pull request? Update docs for two parameters `spark.sql.files.maxPartitionBytes` and `spark.sql.files.openCostInBytes ` in Other Configuration Options. ## How was this patch tested? N/A Author: Takeshi YAMAMURO <linguin....@gmail.com> Closes #13797 from maropu/SPARK-15894-2. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/41e0ffb1 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/41e0ffb1 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/41e0ffb1 Branch: refs/heads/master Commit: 41e0ffb19f678e9b1e87f747a5e4e3d44964e39a Parents: 58f6e27 Author: Takeshi YAMAMURO <linguin....@gmail.com> Authored: Tue Jun 21 14:27:16 2016 +0800 Committer: Cheng Lian <l...@databricks.com> Committed: Tue Jun 21 14:27:16 2016 +0800 ---------------------------------------------------------------------- docs/sql-programming-guide.md | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/spark/blob/41e0ffb1/docs/sql-programming-guide.md ---------------------------------------------------------------------- diff --git a/docs/sql-programming-guide.md b/docs/sql-programming-guide.md index 4206f73..ddf8f70 100644 --- a/docs/sql-programming-guide.md +++ b/docs/sql-programming-guide.md @@ -2016,6 +2016,23 @@ that these options will be deprecated in future release as more optimizations ar <table class="table"> <tr><th>Property Name</th><th>Default</th><th>Meaning</th></tr> <tr> + <td><code>spark.sql.files.maxPartitionBytes</code></td> + <td>134217728 (128 MB)</td> + <td> + The maximum number of bytes to pack into a single partition when reading files. + </td> + </tr> + <tr> + <td><code>spark.sql.files.openCostInBytes</code></td> + <td>4194304 (4 MB)</td> + <td> + The estimated cost to open a file, measured by the number of bytes could be scanned in the same + time. This is used when putting multiple files into a partition. It is better to over estimated, + then the partitions with small files will be faster than partitions with bigger files (which is + scheduled first). + </td> + </tr> + <tr> <td><code>spark.sql.autoBroadcastJoinThreshold</code></td> <td>10485760 (10 MB)</td> <td> --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org