This is an automated email from the ASF dual-hosted git repository.
github-bot pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/datafusion-comet.git
The following commit(s) were added to refs/heads/asf-site by this push:
new 67b93184 Publish built docs triggered by
77c9a6cc03b98e60d6c1b3d2805293826b5d3c2f
67b93184 is described below
commit 67b9318469ecb430cdbff37a740d42908bdd681e
Author: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
AuthorDate: Mon Jun 3 23:03:31 2024 +0000
Publish built docs triggered by 77c9a6cc03b98e60d6c1b3d2805293826b5d3c2f
---
_sources/contributor-guide/development.md.txt | 4 ++
_sources/user-guide/configs.md.txt | 1 -
contributor-guide/development.html | 4 ++
searchindex.js | 2 +-
user-guide/configs.html | 54 +++++++++++++--------------
5 files changed, 34 insertions(+), 31 deletions(-)
diff --git a/_sources/contributor-guide/development.md.txt
b/_sources/contributor-guide/development.md.txt
index 0121d9f4..913eea40 100644
--- a/_sources/contributor-guide/development.md.txt
+++ b/_sources/contributor-guide/development.md.txt
@@ -92,11 +92,13 @@ The plan stability testing framework is located in the
`spark` module and can be
```sh
./mvnw -pl spark
-Dsuites="org.apache.spark.sql.comet.CometTPCDSV1_4_PlanStabilitySuite" test
+./mvnw -pl spark
-Dsuites="org.apache.spark.sql.comet.CometTPCDSV1_4_PlanStabilitySuite"
-Pspark-4.0 -nsu test
```
and
```sh
./mvnw -pl spark
-Dsuites="org.apache.spark.sql.comet.CometTPCDSV2_7_PlanStabilitySuite" test
+./mvnw -pl spark
-Dsuites="org.apache.spark.sql.comet.CometTPCDSV2_7_PlanStabilitySuite"
-Pspark-4.0 -nsu test
```
If your pull request changes the query plans generated by Comet, you should
regenerate the golden files.
@@ -104,11 +106,13 @@ To regenerate the golden files, you can run the following
command:
```sh
SPARK_GENERATE_GOLDEN_FILES=1 ./mvnw -pl spark
-Dsuites="org.apache.spark.sql.comet.CometTPCDSV1_4_PlanStabilitySuite" test
+SPARK_GENERATE_GOLDEN_FILES=1 ./mvnw -pl spark
-Dsuites="org.apache.spark.sql.comet.CometTPCDSV1_4_PlanStabilitySuite"
-Pspark-4.0 -nsu test
```
and
```sh
SPARK_GENERATE_GOLDEN_FILES=1 ./mvnw -pl spark
-Dsuites="org.apache.spark.sql.comet.CometTPCDSV2_7_PlanStabilitySuite" test
+SPARK_GENERATE_GOLDEN_FILES=1 ./mvnw -pl spark
-Dsuites="org.apache.spark.sql.comet.CometTPCDSV2_7_PlanStabilitySuite"
-Pspark-4.0 -nsu test
```
## Benchmark
diff --git a/_sources/user-guide/configs.md.txt
b/_sources/user-guide/configs.md.txt
index eb349b34..104f29ce 100644
--- a/_sources/user-guide/configs.md.txt
+++ b/_sources/user-guide/configs.md.txt
@@ -23,7 +23,6 @@ Comet provides the following configuration settings.
| Config | Description | Default Value |
|--------|-------------|---------------|
-| spark.comet.ansi.enabled | Comet does not respect ANSI mode in most cases
and by default will not accelerate queries when ansi mode is enabled. Enable
this setting to test Comet's experimental support for ANSI mode. This should
not be used in production. | false |
| spark.comet.batchSize | The columnar batch size, i.e., the maximum number of
rows that a batch can contain. | 8192 |
| spark.comet.cast.allowIncompatible | Comet is not currently fully compatible
with Spark for all cast operations. Set this config to true to allow them
anyway. See compatibility guide for more information. | false |
| spark.comet.columnar.shuffle.async.enabled | Whether to enable asynchronous
shuffle for Arrow-based shuffle. By default, this config is false. | false |
diff --git a/contributor-guide/development.html
b/contributor-guide/development.html
index c2cd5d22..4a2e604f 100644
--- a/contributor-guide/development.html
+++ b/contributor-guide/development.html
@@ -442,19 +442,23 @@ in their name in <code class="docutils literal
notranslate"><span class="pre">or
<p>Comet has a plan stability testing framework that can be used to test the
stability of the query plans generated by Comet.
The plan stability testing framework is located in the <code class="docutils
literal notranslate"><span class="pre">spark</span></code> module and can be
run using the following command:</p>
<div class="highlight-sh notranslate"><div
class="highlight"><pre><span></span>./mvnw<span class="w"> </span>-pl<span
class="w"> </span>spark<span class="w"> </span>-Dsuites<span
class="o">=</span><span
class="s2">"org.apache.spark.sql.comet.CometTPCDSV1_4_PlanStabilitySuite"</span><span
class="w"> </span><span class="nb">test</span>
+./mvnw<span class="w"> </span>-pl<span class="w"> </span>spark<span class="w">
</span>-Dsuites<span class="o">=</span><span
class="s2">"org.apache.spark.sql.comet.CometTPCDSV1_4_PlanStabilitySuite"</span><span
class="w"> </span>-Pspark-4.0<span class="w"> </span>-nsu<span class="w">
</span><span class="nb">test</span>
</pre></div>
</div>
<p>and</p>
<div class="highlight-sh notranslate"><div
class="highlight"><pre><span></span>./mvnw<span class="w"> </span>-pl<span
class="w"> </span>spark<span class="w"> </span>-Dsuites<span
class="o">=</span><span
class="s2">"org.apache.spark.sql.comet.CometTPCDSV2_7_PlanStabilitySuite"</span><span
class="w"> </span><span class="nb">test</span>
+./mvnw<span class="w"> </span>-pl<span class="w"> </span>spark<span class="w">
</span>-Dsuites<span class="o">=</span><span
class="s2">"org.apache.spark.sql.comet.CometTPCDSV2_7_PlanStabilitySuite"</span><span
class="w"> </span>-Pspark-4.0<span class="w"> </span>-nsu<span class="w">
</span><span class="nb">test</span>
</pre></div>
</div>
<p>If your pull request changes the query plans generated by Comet, you should
regenerate the golden files.
To regenerate the golden files, you can run the following command:</p>
<div class="highlight-sh notranslate"><div
class="highlight"><pre><span></span><span
class="nv">SPARK_GENERATE_GOLDEN_FILES</span><span class="o">=</span><span
class="m">1</span><span class="w"> </span>./mvnw<span class="w">
</span>-pl<span class="w"> </span>spark<span class="w"> </span>-Dsuites<span
class="o">=</span><span
class="s2">"org.apache.spark.sql.comet.CometTPCDSV1_4_PlanStabilitySuite"</span><span
class="w"> </span><span class="nb">test</span>
+<span class="nv">SPARK_GENERATE_GOLDEN_FILES</span><span
class="o">=</span><span class="m">1</span><span class="w"> </span>./mvnw<span
class="w"> </span>-pl<span class="w"> </span>spark<span class="w">
</span>-Dsuites<span class="o">=</span><span
class="s2">"org.apache.spark.sql.comet.CometTPCDSV1_4_PlanStabilitySuite"</span><span
class="w"> </span>-Pspark-4.0<span class="w"> </span>-nsu<span class="w">
</span><span class="nb">test</span>
</pre></div>
</div>
<p>and</p>
<div class="highlight-sh notranslate"><div
class="highlight"><pre><span></span><span
class="nv">SPARK_GENERATE_GOLDEN_FILES</span><span class="o">=</span><span
class="m">1</span><span class="w"> </span>./mvnw<span class="w">
</span>-pl<span class="w"> </span>spark<span class="w"> </span>-Dsuites<span
class="o">=</span><span
class="s2">"org.apache.spark.sql.comet.CometTPCDSV2_7_PlanStabilitySuite"</span><span
class="w"> </span><span class="nb">test</span>
+<span class="nv">SPARK_GENERATE_GOLDEN_FILES</span><span
class="o">=</span><span class="m">1</span><span class="w"> </span>./mvnw<span
class="w"> </span>-pl<span class="w"> </span>spark<span class="w">
</span>-Dsuites<span class="o">=</span><span
class="s2">"org.apache.spark.sql.comet.CometTPCDSV2_7_PlanStabilitySuite"</span><span
class="w"> </span>-Pspark-4.0<span class="w"> </span>-nsu<span class="w">
</span><span class="nb">test</span>
</pre></div>
</div>
</section>
diff --git a/searchindex.js b/searchindex.js
index a7028338..c95c0488 100644
--- a/searchindex.js
+++ b/searchindex.js
@@ -1 +1 @@
-Search.setIndex({"alltitles": {"ANSI mode": [[8, "ansi-mode"]], "API
Differences Between Spark Versions": [[0,
"api-differences-between-spark-versions"]], "ASF Links": [[7, null]], "Adding
Spark-side Tests for the New Expression": [[0,
"adding-spark-side-tests-for-the-new-expression"]], "Adding a New Expression":
[[0, "adding-a-new-expression"]], "Adding a New Scalar Function Expression":
[[0, "adding-a-new-scalar-function-expression"]], "Adding the Expression To the
Protobuf Definition" [...]
\ No newline at end of file
+Search.setIndex({"alltitles": {"ANSI mode": [[8, "ansi-mode"]], "API
Differences Between Spark Versions": [[0,
"api-differences-between-spark-versions"]], "ASF Links": [[7, null]], "Adding
Spark-side Tests for the New Expression": [[0,
"adding-spark-side-tests-for-the-new-expression"]], "Adding a New Expression":
[[0, "adding-a-new-expression"]], "Adding a New Scalar Function Expression":
[[0, "adding-a-new-scalar-function-expression"]], "Adding the Expression To the
Protobuf Definition" [...]
\ No newline at end of file
diff --git a/user-guide/configs.html b/user-guide/configs.html
index f248982d..4730e524 100644
--- a/user-guide/configs.html
+++ b/user-guide/configs.html
@@ -313,107 +313,103 @@ under the License.
</tr>
</thead>
<tbody>
-<tr class="row-even"><td><p>spark.comet.ansi.enabled</p></td>
-<td><p>Comet does not respect ANSI mode in most cases and by default will not
accelerate queries when ansi mode is enabled. Enable this setting to test
Comet’s experimental support for ANSI mode. This should not be used in
production.</p></td>
-<td><p>false</p></td>
-</tr>
-<tr class="row-odd"><td><p>spark.comet.batchSize</p></td>
+<tr class="row-even"><td><p>spark.comet.batchSize</p></td>
<td><p>The columnar batch size, i.e., the maximum number of rows that a batch
can contain.</p></td>
<td><p>8192</p></td>
</tr>
-<tr class="row-even"><td><p>spark.comet.cast.allowIncompatible</p></td>
+<tr class="row-odd"><td><p>spark.comet.cast.allowIncompatible</p></td>
<td><p>Comet is not currently fully compatible with Spark for all cast
operations. Set this config to true to allow them anyway. See compatibility
guide for more information.</p></td>
<td><p>false</p></td>
</tr>
-<tr class="row-odd"><td><p>spark.comet.columnar.shuffle.async.enabled</p></td>
+<tr class="row-even"><td><p>spark.comet.columnar.shuffle.async.enabled</p></td>
<td><p>Whether to enable asynchronous shuffle for Arrow-based shuffle. By
default, this config is false.</p></td>
<td><p>false</p></td>
</tr>
-<tr
class="row-even"><td><p>spark.comet.columnar.shuffle.async.max.thread.num</p></td>
+<tr
class="row-odd"><td><p>spark.comet.columnar.shuffle.async.max.thread.num</p></td>
<td><p>Maximum number of threads on an executor used for Comet async columnar
shuffle. By default, this config is 100. This is the upper bound of total
number of shuffle threads per executor. In other words, if the number of cores
* the number of shuffle threads per task <code class="docutils literal
notranslate"><span
class="pre">spark.comet.columnar.shuffle.async.thread.num</span></code> is
larger than this config. Comet will use this config as the number of shuffle
threads per executo [...]
<td><p>100</p></td>
</tr>
-<tr
class="row-odd"><td><p>spark.comet.columnar.shuffle.async.thread.num</p></td>
+<tr
class="row-even"><td><p>spark.comet.columnar.shuffle.async.thread.num</p></td>
<td><p>Number of threads used for Comet async columnar shuffle per shuffle
task. By default, this config is 3. Note that more threads means more memory
requirement to buffer shuffle data before flushing to disk. Also, more threads
may not always improve performance, and should be set based on the number of
cores available.</p></td>
<td><p>3</p></td>
</tr>
-<tr class="row-even"><td><p>spark.comet.columnar.shuffle.memory.factor</p></td>
+<tr class="row-odd"><td><p>spark.comet.columnar.shuffle.memory.factor</p></td>
<td><p>Fraction of Comet memory to be allocated per executor process for Comet
shuffle. Comet memory size is specified by <code class="docutils literal
notranslate"><span class="pre">spark.comet.memoryOverhead</span></code> or
calculated by <code class="docutils literal notranslate"><span
class="pre">spark.comet.memory.overhead.factor</span></code> * <code
class="docutils literal notranslate"><span
class="pre">spark.executor.memory</span></code>. By default, this config is
1.0.</p></td>
<td><p>1.0</p></td>
</tr>
-<tr class="row-odd"><td><p>spark.comet.debug.enabled</p></td>
+<tr class="row-even"><td><p>spark.comet.debug.enabled</p></td>
<td><p>Whether to enable debug mode for Comet. By default, this config is
false. When enabled, Comet will do additional checks for debugging purpose. For
example, validating array when importing arrays from JVM at native side. Note
that these checks may be expensive in performance and should only be enabled
for debugging purpose.</p></td>
<td><p>false</p></td>
</tr>
-<tr class="row-even"><td><p>spark.comet.enabled</p></td>
+<tr class="row-odd"><td><p>spark.comet.enabled</p></td>
<td><p>Whether to enable Comet extension for Spark. When this is turned on,
Spark will use Comet to read Parquet data source. Note that to enable native
vectorized execution, both this config and ‘spark.comet.exec.enabled’ need to
be enabled. By default, this config is the value of the env var <code
class="docutils literal notranslate"><span
class="pre">ENABLE_COMET</span></code> if set, or true otherwise.</p></td>
<td><p>true</p></td>
</tr>
-<tr class="row-odd"><td><p>spark.comet.exceptionOnDatetimeRebase</p></td>
+<tr class="row-even"><td><p>spark.comet.exceptionOnDatetimeRebase</p></td>
<td><p>Whether to throw exception when seeing dates/timestamps from the legacy
hybrid (Julian + Gregorian) calendar. Since Spark 3, dates/timestamps were
written according to the Proleptic Gregorian calendar. When this is true, Comet
will throw exceptions when seeing these dates/timestamps that were written by
Spark version before 3.0. If this is false, these dates/timestamps will be read
as if they were written to the Proleptic Gregorian calendar and will not be
rebased.</p></td>
<td><p>false</p></td>
</tr>
-<tr class="row-even"><td><p>spark.comet.exec.all.enabled</p></td>
+<tr class="row-odd"><td><p>spark.comet.exec.all.enabled</p></td>
<td><p>Whether to enable all Comet operators. By default, this config is
false. Note that this config precedes all separate config
‘spark.comet.exec.<operator_name>.enabled’. That being said, if this
config is enabled, separate configs are ignored.</p></td>
<td><p>false</p></td>
</tr>
-<tr class="row-odd"><td><p>spark.comet.exec.enabled</p></td>
+<tr class="row-even"><td><p>spark.comet.exec.enabled</p></td>
<td><p>Whether to enable Comet native vectorized execution for Spark. This
controls whether Spark should convert operators into their Comet counterparts
and execute them in native space. Note: each operator is associated with a
separate config in the format of
‘spark.comet.exec.<operator_name>.enabled’ at the moment, and both the
config and this need to be turned on, in order for the operator to be executed
in native. By default, this config is false.</p></td>
<td><p>false</p></td>
</tr>
-<tr class="row-even"><td><p>spark.comet.exec.memoryFraction</p></td>
+<tr class="row-odd"><td><p>spark.comet.exec.memoryFraction</p></td>
<td><p>The fraction of memory from Comet memory overhead that the native
memory manager can use for execution. The purpose of this config is to set
aside memory for untracked data structures, as well as imprecise size
estimation during memory acquisition. Default value is 0.7.</p></td>
<td><p>0.7</p></td>
</tr>
-<tr class="row-odd"><td><p>spark.comet.exec.shuffle.codec</p></td>
+<tr class="row-even"><td><p>spark.comet.exec.shuffle.codec</p></td>
<td><p>The codec of Comet native shuffle used to compress shuffle data. Only
zstd is supported.</p></td>
<td><p>zstd</p></td>
</tr>
-<tr class="row-even"><td><p>spark.comet.exec.shuffle.enabled</p></td>
+<tr class="row-odd"><td><p>spark.comet.exec.shuffle.enabled</p></td>
<td><p>Whether to enable Comet native shuffle. By default, this config is
false. Note that this requires setting ‘spark.shuffle.manager’ to
‘org.apache.spark.sql.comet.execution.shuffle.CometShuffleManager’.
‘spark.shuffle.manager’ must be set before starting the Spark application and
cannot be changed during the application.</p></td>
<td><p>false</p></td>
</tr>
-<tr class="row-odd"><td><p>spark.comet.exec.shuffle.mode</p></td>
+<tr class="row-even"><td><p>spark.comet.exec.shuffle.mode</p></td>
<td><p>The mode of Comet shuffle. This config is only effective if Comet
shuffle is enabled. Available modes are ‘native’, ‘jvm’, and ‘auto’. ‘native’
is for native shuffle which has best performance in general. ‘jvm’ is for
jvm-based columnar shuffle which has higher coverage than native shuffle.
‘auto’ is for Comet to choose the best shuffle mode based on the query plan. By
default, this config is ‘jvm’.</p></td>
<td><p>jvm</p></td>
</tr>
-<tr class="row-even"><td><p>spark.comet.explainFallback.enabled</p></td>
+<tr class="row-odd"><td><p>spark.comet.explainFallback.enabled</p></td>
<td><p>When this setting is enabled, Comet will provide logging explaining the
reason(s) why a query stage cannot be executed natively.</p></td>
<td><p>false</p></td>
</tr>
-<tr class="row-odd"><td><p>spark.comet.memory.overhead.factor</p></td>
+<tr class="row-even"><td><p>spark.comet.memory.overhead.factor</p></td>
<td><p>Fraction of executor memory to be allocated as additional non-heap
memory per executor process for Comet. Default value is 0.2.</p></td>
<td><p>0.2</p></td>
</tr>
-<tr class="row-even"><td><p>spark.comet.memory.overhead.min</p></td>
+<tr class="row-odd"><td><p>spark.comet.memory.overhead.min</p></td>
<td><p>Minimum amount of additional memory to be allocated per executor
process for Comet, in MiB.</p></td>
<td><p>402653184b</p></td>
</tr>
-<tr class="row-odd"><td><p>spark.comet.nativeLoadRequired</p></td>
+<tr class="row-even"><td><p>spark.comet.nativeLoadRequired</p></td>
<td><p>Whether to require Comet native library to load successfully when Comet
is enabled. If not, Comet will silently fallback to Spark when it fails to load
the native lib. Otherwise, an error will be thrown and the Spark job will be
aborted.</p></td>
<td><p>false</p></td>
</tr>
-<tr class="row-even"><td><p>spark.comet.parquet.enable.directBuffer</p></td>
+<tr class="row-odd"><td><p>spark.comet.parquet.enable.directBuffer</p></td>
<td><p>Whether to use Java direct byte buffer when reading Parquet. By
default, this is false</p></td>
<td><p>false</p></td>
</tr>
-<tr
class="row-odd"><td><p>spark.comet.rowToColumnar.supportedOperatorList</p></td>
+<tr
class="row-even"><td><p>spark.comet.rowToColumnar.supportedOperatorList</p></td>
<td><p>A comma-separated list of row-based operators that will be converted to
columnar format when ‘spark.comet.rowToColumnar.enabled’ is true</p></td>
<td><p>Range,InMemoryTableScan</p></td>
</tr>
-<tr class="row-even"><td><p>spark.comet.scan.enabled</p></td>
+<tr class="row-odd"><td><p>spark.comet.scan.enabled</p></td>
<td><p>Whether to enable Comet scan. When this is turned on, Spark will use
Comet to read Parquet data source. Note that to enable native vectorized
execution, both this config and ‘spark.comet.exec.enabled’ need to be enabled.
By default, this config is true.</p></td>
<td><p>true</p></td>
</tr>
-<tr class="row-odd"><td><p>spark.comet.scan.preFetch.enabled</p></td>
+<tr class="row-even"><td><p>spark.comet.scan.preFetch.enabled</p></td>
<td><p>Whether to enable pre-fetching feature of CometScan. By default is
disabled.</p></td>
<td><p>false</p></td>
</tr>
-<tr class="row-even"><td><p>spark.comet.scan.preFetch.threadNum</p></td>
+<tr class="row-odd"><td><p>spark.comet.scan.preFetch.threadNum</p></td>
<td><p>The number of threads running pre-fetching for CometScan. Effective if
spark.comet.scan.preFetch.enabled is enabled. By default it is 2. Note that
more pre-fetching threads means more memory requirement to store pre-fetched
row groups.</p></td>
<td><p>2</p></td>
</tr>
-<tr class="row-odd"><td><p>spark.comet.shuffle.preferDictionary.ratio</p></td>
+<tr class="row-even"><td><p>spark.comet.shuffle.preferDictionary.ratio</p></td>
<td><p>The ratio of total values to distinct values in a string column to
decide whether to prefer dictionary encoding when shuffling the column. If the
ratio is higher than this config, dictionary encoding will be used on shuffling
string column. This config is effective if it is higher than 1.0. By default,
this config is 10.0. Note that this config is only used when
‘spark.comet.columnar.shuffle.enabled’ is true.</p></td>
<td><p>10.0</p></td>
</tr>
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]