This is an automated email from the ASF dual-hosted git repository.

github-bot pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/datafusion-comet.git


The following commit(s) were added to refs/heads/asf-site by this push:
     new 0dc4d691 Publish built docs triggered by 
0667c60b8817dcf4fa05ba218cc4a97a5d36c559
0dc4d691 is described below

commit 0dc4d6915e142477a659a6d07c9221896bfba6f9
Author: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
AuthorDate: Mon Oct 7 16:00:50 2024 +0000

    Publish built docs triggered by 0667c60b8817dcf4fa05ba218cc4a97a5d36c559
---
 _sources/user-guide/configs.md.txt |  5 +++++
 searchindex.js                     |  2 +-
 user-guide/configs.html            | 32 ++++++++++++++++++++++++++------
 3 files changed, 32 insertions(+), 7 deletions(-)

diff --git a/_sources/user-guide/configs.md.txt 
b/_sources/user-guide/configs.md.txt
index ff2db342..f7ef1d55 100644
--- a/_sources/user-guide/configs.md.txt
+++ b/_sources/user-guide/configs.md.txt
@@ -66,6 +66,11 @@ Comet provides the following configuration settings.
 | spark.comet.memory.overhead.min | Minimum amount of additional memory to be 
allocated per executor process for Comet, in MiB. | 402653184b |
 | spark.comet.nativeLoadRequired | Whether to require Comet native library to 
load successfully when Comet is enabled. If not, Comet will silently fallback 
to Spark when it fails to load the native lib. Otherwise, an error will be 
thrown and the Spark job will be aborted. | false |
 | spark.comet.parquet.enable.directBuffer | Whether to use Java direct byte 
buffer when reading Parquet. By default, this is false | false |
+| spark.comet.parquet.read.io.adjust.readRange.skew | In the parallel reader, 
if the read ranges submitted are skewed in sizes, this option will cause the 
reader to break up larger read ranges into smaller ranges to reduce the skew. 
This will result in a slightly larger number of connections opened to the file 
system but may give improved performance. The option is off by default. | false 
|
+| spark.comet.parquet.read.io.mergeRanges | When enabled the parallel reader 
will try to merge ranges of data that are separated by less than 
'comet.parquet.read.io.mergeRanges.delta' bytes. Longer continuous reads are 
faster on cloud storage. The default behavior is to merge consecutive ranges. | 
true |
+| spark.comet.parquet.read.io.mergeRanges.delta | The delta in bytes between 
consecutive read ranges below which the parallel reader will try to merge the 
ranges. The default is 8MB. | 8388608 |
+| spark.comet.parquet.read.parallel.io.enabled | Whether to enable Comet's 
parallel reader for Parquet files. The parallel reader reads ranges of 
consecutive data in a  file in parallel. It is faster for large files and row 
groups but uses more resources. The parallel reader is enabled by default. | 
true |
+| spark.comet.parquet.read.parallel.io.thread-pool.size | The maximum number 
of parallel threads the parallel reader will use in a single executor. For 
executors configured with a smaller number of cores, use a smaller number. | 16 
|
 | spark.comet.regexp.allowIncompatible | Comet is not currently fully 
compatible with Spark for all regular expressions. Set this config to true to 
allow them anyway using Rust's regular expression engine. See compatibility 
guide for more information. | false |
 | spark.comet.scan.enabled | Whether to enable native scans. When this is 
turned on, Spark will use Comet to read supported data sources (currently only 
Parquet is supported natively). Note that to enable native vectorized 
execution, both this config and 'spark.comet.exec.enabled' need to be enabled. 
By default, this config is true. | true |
 | spark.comet.scan.preFetch.enabled | Whether to enable pre-fetching feature 
of CometScan. By default is disabled. | false |
diff --git a/searchindex.js b/searchindex.js
index 1333fffe..a69b5051 100644
--- a/searchindex.js
+++ b/searchindex.js
@@ -1 +1 @@
-Search.setIndex({"alltitles": {"1. Install Comet": [[9, "install-comet"]], "2. 
Clone Spark and Apply Diff": [[9, "clone-spark-and-apply-diff"]], "3. Run Spark 
SQL Tests": [[9, "run-spark-sql-tests"]], "ANSI mode": [[11, "ansi-mode"]], 
"API Differences Between Spark Versions": [[0, 
"api-differences-between-spark-versions"]], "ASF Links": [[10, null]], "Adding 
Spark-side Tests for the New Expression": [[0, 
"adding-spark-side-tests-for-the-new-expression"]], "Adding a New Expression": 
[[0,  [...]
\ No newline at end of file
+Search.setIndex({"alltitles": {"1. Install Comet": [[9, "install-comet"]], "2. 
Clone Spark and Apply Diff": [[9, "clone-spark-and-apply-diff"]], "3. Run Spark 
SQL Tests": [[9, "run-spark-sql-tests"]], "ANSI mode": [[11, "ansi-mode"]], 
"API Differences Between Spark Versions": [[0, 
"api-differences-between-spark-versions"]], "ASF Links": [[10, null]], "Adding 
Spark-side Tests for the New Expression": [[0, 
"adding-spark-side-tests-for-the-new-expression"]], "Adding a New Expression": 
[[0,  [...]
\ No newline at end of file
diff --git a/user-guide/configs.html b/user-guide/configs.html
index 88e901ec..64decce4 100644
--- a/user-guide/configs.html
+++ b/user-guide/configs.html
@@ -495,27 +495,47 @@ under the License.
 <td><p>Whether to use Java direct byte buffer when reading Parquet. By 
default, this is false</p></td>
 <td><p>false</p></td>
 </tr>
-<tr class="row-odd"><td><p>spark.comet.regexp.allowIncompatible</p></td>
+<tr 
class="row-odd"><td><p>spark.comet.parquet.read.io.adjust.readRange.skew</p></td>
+<td><p>In the parallel reader, if the read ranges submitted are skewed in 
sizes, this option will cause the reader to break up larger read ranges into 
smaller ranges to reduce the skew. This will result in a slightly larger number 
of connections opened to the file system but may give improved performance. The 
option is off by default.</p></td>
+<td><p>false</p></td>
+</tr>
+<tr class="row-even"><td><p>spark.comet.parquet.read.io.mergeRanges</p></td>
+<td><p>When enabled the parallel reader will try to merge ranges of data that 
are separated by less than ‘comet.parquet.read.io.mergeRanges.delta’ bytes. 
Longer continuous reads are faster on cloud storage. The default behavior is to 
merge consecutive ranges.</p></td>
+<td><p>true</p></td>
+</tr>
+<tr 
class="row-odd"><td><p>spark.comet.parquet.read.io.mergeRanges.delta</p></td>
+<td><p>The delta in bytes between consecutive read ranges below which the 
parallel reader will try to merge the ranges. The default is 8MB.</p></td>
+<td><p>8388608</p></td>
+</tr>
+<tr 
class="row-even"><td><p>spark.comet.parquet.read.parallel.io.enabled</p></td>
+<td><p>Whether to enable Comet’s parallel reader for Parquet files. The 
parallel reader reads ranges of consecutive data in a  file in parallel. It is 
faster for large files and row groups but uses more resources. The parallel 
reader is enabled by default.</p></td>
+<td><p>true</p></td>
+</tr>
+<tr 
class="row-odd"><td><p>spark.comet.parquet.read.parallel.io.thread-pool.size</p></td>
+<td><p>The maximum number of parallel threads the parallel reader will use in 
a single executor. For executors configured with a smaller number of cores, use 
a smaller number.</p></td>
+<td><p>16</p></td>
+</tr>
+<tr class="row-even"><td><p>spark.comet.regexp.allowIncompatible</p></td>
 <td><p>Comet is not currently fully compatible with Spark for all regular 
expressions. Set this config to true to allow them anyway using Rust’s regular 
expression engine. See compatibility guide for more information.</p></td>
 <td><p>false</p></td>
 </tr>
-<tr class="row-even"><td><p>spark.comet.scan.enabled</p></td>
+<tr class="row-odd"><td><p>spark.comet.scan.enabled</p></td>
 <td><p>Whether to enable native scans. When this is turned on, Spark will use 
Comet to read supported data sources (currently only Parquet is supported 
natively). Note that to enable native vectorized execution, both this config 
and ‘spark.comet.exec.enabled’ need to be enabled. By default, this config is 
true.</p></td>
 <td><p>true</p></td>
 </tr>
-<tr class="row-odd"><td><p>spark.comet.scan.preFetch.enabled</p></td>
+<tr class="row-even"><td><p>spark.comet.scan.preFetch.enabled</p></td>
 <td><p>Whether to enable pre-fetching feature of CometScan. By default is 
disabled.</p></td>
 <td><p>false</p></td>
 </tr>
-<tr class="row-even"><td><p>spark.comet.scan.preFetch.threadNum</p></td>
+<tr class="row-odd"><td><p>spark.comet.scan.preFetch.threadNum</p></td>
 <td><p>The number of threads running pre-fetching for CometScan. Effective if 
spark.comet.scan.preFetch.enabled is enabled. By default it is 2. Note that 
more pre-fetching threads means more memory requirement to store pre-fetched 
row groups.</p></td>
 <td><p>2</p></td>
 </tr>
-<tr class="row-odd"><td><p>spark.comet.shuffle.preferDictionary.ratio</p></td>
+<tr class="row-even"><td><p>spark.comet.shuffle.preferDictionary.ratio</p></td>
 <td><p>The ratio of total values to distinct values in a string column to 
decide whether to prefer dictionary encoding when shuffling the column. If the 
ratio is higher than this config, dictionary encoding will be used on shuffling 
string column. This config is effective if it is higher than 1.0. By default, 
this config is 10.0. Note that this config is only used when <code 
class="docutils literal notranslate"><span 
class="pre">spark.comet.exec.shuffle.mode</span></code> is <code class= [...]
 <td><p>10.0</p></td>
 </tr>
-<tr 
class="row-even"><td><p>spark.comet.sparkToColumnar.supportedOperatorList</p></td>
+<tr 
class="row-odd"><td><p>spark.comet.sparkToColumnar.supportedOperatorList</p></td>
 <td><p>A comma-separated list of operators that will be converted to Arrow 
columnar format when ‘spark.comet.sparkToColumnar.enabled’ is true</p></td>
 <td><p>Range,InMemoryTableScan</p></td>
 </tr>


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to