This is an automated email from the ASF dual-hosted git repository.
github-bot pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/datafusion-comet.git
The following commit(s) were added to refs/heads/asf-site by this push:
new fbe21f95 Publish built docs triggered by
837c256f0de16ea06b04bdc84503367b8a87be03
fbe21f95 is described below
commit fbe21f95669a79fd6d9158d67a07e10a49b8a5ee
Author: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
AuthorDate: Tue Oct 8 21:16:38 2024 +0000
Publish built docs triggered by 837c256f0de16ea06b04bdc84503367b8a87be03
---
_sources/contributor-guide/plugin_overview.md.txt | 4 +-
_sources/index.rst.txt | 2 +
_sources/user-guide/installation.md.txt | 107 +++++--------
_sources/user-guide/overview.md.txt | 34 ++--
_sources/user-guide/source.md.txt | 69 ++++++++
_static/images/CometNativeExecution.drawio.png | Bin 61017 -> 0 bytes
_static/images/CometNativeParquetReader.drawio | 100 ++++++++++++
_static/images/CometNativeParquetReader.drawio.svg | 4 +
_static/images/CometNativeParquetScan.drawio.png | Bin 75703 -> 0 bytes
_static/images/CometOverviewDetailed.drawio | 94 +++++++++++
_static/images/CometOverviewDetailed.drawio.svg | 4 +
contributor-guide/adding_a_new_expression.html | 10 ++
contributor-guide/benchmark-results/tpc-ds.html | 10 ++
contributor-guide/benchmark-results/tpc-h.html | 10 ++
contributor-guide/benchmarking.html | 10 ++
contributor-guide/contributing.html | 10 ++
contributor-guide/debugging.html | 10 ++
contributor-guide/development.html | 10 ++
contributor-guide/plugin_overview.html | 14 +-
contributor-guide/profiling_native_code.html | 10 ++
contributor-guide/spark-sql-tests.html | 10 ++
genindex.html | 10 ++
index.html | 12 ++
objects.inv | Bin 751 -> 773 bytes
search.html | 10 ++
searchindex.js | 2 +-
user-guide/compatibility.html | 10 ++
user-guide/configs.html | 10 ++
user-guide/datasources.html | 16 +-
user-guide/datatypes.html | 10 ++
user-guide/expressions.html | 10 ++
user-guide/installation.html | 174 ++++++++++-----------
user-guide/kubernetes.html | 28 +++-
user-guide/operators.html | 10 ++
user-guide/overview.html | 47 +++---
user-guide/{datasources.html => source.html} | 88 +++++++----
user-guide/tuning.html | 10 ++
37 files changed, 724 insertions(+), 245 deletions(-)
diff --git a/_sources/contributor-guide/plugin_overview.md.txt
b/_sources/contributor-guide/plugin_overview.md.txt
index c7538290..a211ca6b 100644
--- a/_sources/contributor-guide/plugin_overview.md.txt
+++ b/_sources/contributor-guide/plugin_overview.md.txt
@@ -79,10 +79,10 @@ The leaf nodes in the physical plan are always `ScanExec`
and these operators co
prepared before the plan is executed. When `CometExecIterator` invokes
`Native.executePlan` it passes the memory
addresses of these Arrow arrays to the native code.
-
+
## End to End Flow
The following diagram shows the end-to-end flow.
-
+
diff --git a/_sources/index.rst.txt b/_sources/index.rst.txt
index 4bf5d9fd..39ad27a5 100644
--- a/_sources/index.rst.txt
+++ b/_sources/index.rst.txt
@@ -42,6 +42,8 @@ as a native runtime to achieve improvement in terms of query
efficiency and quer
Comet Overview <user-guide/overview>
Installing Comet <user-guide/installation>
+ Building From Source <user-guide/source>
+ Kubernetes Guide <user-guide/kubernetes>
Supported Data Sources <user-guide/datasources>
Supported Data Types <user-guide/datatypes>
Supported Operators <user-guide/operators>
diff --git a/_sources/user-guide/installation.md.txt
b/_sources/user-guide/installation.md.txt
index dc4429b8..343b6586 100644
--- a/_sources/user-guide/installation.md.txt
+++ b/_sources/user-guide/installation.md.txt
@@ -19,73 +19,54 @@
# Installing DataFusion Comet
+## Prerequisites
+
Make sure the following requirements are met and software installed on your
machine.
-## Supported Platforms
+### Supported Operating Systems
- Linux
- Apple OSX (Intel and Apple Silicon)
-## Requirements
+### Supported Spark Versions
-- [Apache Spark supported by
Comet](overview.md#supported-apache-spark-versions)
-- JDK 8 and up
-- GLIBC 2.17 (Centos 7) and up
+Comet currently supports the following versions of Apache Spark:
-## Deploying to Kubernetes
+- 3.3.x (Java 8/11/17, Scala 2.12/2.13)
+- 3.4.x (Java 8/11/17, Scala 2.12/2.13)
+- 3.5.x (Java 8/11/17, Scala 2.12/2.13)
-See the [Comet Kubernetes Guide](kubernetes.md) guide.
-
-## Using a Published JAR File
+Experimental support is provided for the following versions of Apache Spark
and is intended for development/testing
+use only and should not be used in production yet.
-Pre-built jar files are available in Maven central at
https://central.sonatype.com/namespace/org.apache.datafusion
+- 4.0.0-preview1 (Java 17/21, Scala 2.13)
-## Using a Published Source Release
-
-Official source releases can be downloaded from
https://dist.apache.org/repos/dist/release/datafusion/
-
-```console
-# Pick the latest version
-export COMET_VERSION=0.3.0
-# Download the tarball
-curl -O
"https://dist.apache.org/repos/dist/release/datafusion/datafusion-comet-$COMET_VERSION/apache-datafusion-comet-$COMET_VERSION.tar.gz"
-# Unpack
-tar -xzf apache-datafusion-comet-$COMET_VERSION.tar.gz
-cd apache-datafusion-comet-$COMET_VERSION
-```
+Note that Comet may not fully work with proprietary forks of Apache Spark such
as the Spark versions offered by
+Cloud Service Providers.
-Build
-
-```console
-make release-nogit PROFILES="-Pspark-3.4"
-```
-
-## Building from the GitHub repository
+## Using a Published JAR File
-Clone the repository:
+Comet jar files are available in [Maven
Central](https://central.sonatype.com/namespace/org.apache.datafusion).
-```console
-git clone https://github.com/apache/datafusion-comet.git
-```
+Here are the direct links for downloading the Comet jar file.
-Build Comet for a specific Spark version:
+- [Comet plugin for Spark 3.3 / Scala
2.12](https://repo1.maven.org/maven2/org/apache/datafusion/comet-spark-spark3.3_2.12/0.3.0/comet-spark-spark3.3_2.12-0.3.0.jar)
+- [Comet plugin for Spark 3.3 / Scala
2.13](https://repo1.maven.org/maven2/org/apache/datafusion/comet-spark-spark3.3_2.13/0.3.0/comet-spark-spark3.3_2.13-0.3.0.jar)
+- [Comet plugin for Spark 3.4 / Scala
2.12](https://repo1.maven.org/maven2/org/apache/datafusion/comet-spark-spark3.4_2.12/0.3.0/comet-spark-spark3.4_2.12-0.3.0.jar)
+- [Comet plugin for Spark 3.4 / Scala
2.13](https://repo1.maven.org/maven2/org/apache/datafusion/comet-spark-spark3.4_2.13/0.3.0/comet-spark-spark3.4_2.13-0.3.0.jar)
+- [Comet plugin for Spark 3.5 / Scala
2.12](https://repo1.maven.org/maven2/org/apache/datafusion/comet-spark-spark3.5_2.12/0.3.0/comet-spark-spark3.5_2.12-0.3.0.jar)
+- [Comet plugin for Spark 3.5 / Scala
2.13](https://repo1.maven.org/maven2/org/apache/datafusion/comet-spark-spark3.5_2.13/0.3.0/comet-spark-spark3.5_2.13-0.3.0.jar)
-```console
-cd datafusion-comet
-make release PROFILES="-Pspark-3.4"
-```
+## Building from source
-Note that the project builds for Scala 2.12 by default but can be built for
Scala 2.13 using an additional profile:
+Refer to the [Building from Source] guide for instructions from building Comet
from source, either from official
+source releases, or from the latest code in the GitHub repository.
-```console
-make release PROFILES="-Pspark-3.4 -Pscala-2.13"
-```
+[Building from Source]: source.md
-To build Comet from the source distribution on an isolated environment without
an access to `github.com` it is necessary to disable
`git-commit-id-maven-plugin`, otherwise you will face errors that there is no
access to the git during the build process. In that case you may use:
+## Deploying to Kubernetes
-```console
-make release-nogit PROFILES="-Pspark-3.4"
-```
+See the [Comet Kubernetes Guide](kubernetes.md) guide.
## Run Spark Shell with Comet enabled
@@ -99,11 +80,10 @@ $SPARK_HOME/bin/spark-shell \
--conf spark.driver.extraClassPath=$COMET_JAR \
--conf spark.executor.extraClassPath=$COMET_JAR \
--conf spark.plugins=org.apache.spark.CometPlugin \
- --conf spark.comet.enabled=true \
- --conf spark.comet.exec.enabled=true \
+ --conf
spark.shuffle.manager=org.apache.spark.sql.comet.execution.shuffle.CometShuffleManager
--conf spark.comet.explainFallback.enabled=true \
- --conf spark.driver.memory=1g \
- --conf spark.executor.memory=1g
+ --conf spark.memory.offHeap.enabled=true \
+ --conf spark.memory.offHeap.size=16g \
```
### Verify Comet enabled for Spark SQL query
@@ -142,20 +122,9 @@ WARN CometSparkSessionExtensions$CometExecRule: Comet
cannot execute some parts
- Execute InsertIntoHadoopFsRelationCommand is not supported
```
-### Enable Comet shuffle
+## Additional Configuration
-Comet shuffle feature is disabled by default. To enable it, please add related
configs:
-
-```
---conf
spark.shuffle.manager=org.apache.spark.sql.comet.execution.shuffle.CometShuffleManager
---conf spark.comet.exec.shuffle.enabled=true
-```
-
-Above configs enable Comet native shuffle which only supports hash partition
and single partition.
-Comet native shuffle doesn't support complex types yet.
-
-Comet doesn't have official release yet so currently the only way to test it
is to build jar and include it in your
-Spark application. Depending on your deployment mode you may also need to set
the driver & executor class path(s) to
+Depending on your deployment mode you may also need to set the driver &
executor class path(s) to
explicitly contain Comet otherwise Spark may use a different class-loader for
the Comet components than its internal
components which will then fail at runtime. For example:
@@ -165,11 +134,7 @@ components which will then fail at runtime. For example:
Some cluster managers may require additional configuration, see
<https://spark.apache.org/docs/latest/cluster-overview.html>
-To enable columnar shuffle which supports all partitioning and basic complex
types, one more config is required:
-
-```
---conf spark.comet.exec.shuffle.mode=jvm
-```
-
### Memory tuning
-In addition to Apache Spark memory configuration parameters the Comet
introduces own parameters to configure memory allocation for native execution.
More [Comet Memory Tuning](./tuning.md)
+
+In addition to Apache Spark memory configuration parameters, Comet introduces
additional parameters to configure memory
+allocation for native execution. See [Comet Memory Tuning](./tuning.md) for
details.
diff --git a/_sources/user-guide/overview.md.txt
b/_sources/user-guide/overview.md.txt
index e386aec8..92dfe2bb 100644
--- a/_sources/user-guide/overview.md.txt
+++ b/_sources/user-guide/overview.md.txt
@@ -19,8 +19,14 @@
# Comet Overview
-Comet runs Spark SQL queries using the native Apache DataFusion runtime, which
is
-typically faster and more resource efficient than JVM based runtimes.
+Apache DataFusion Comet is a high-performance accelerator for Apache Spark,
built on top of the powerful
+[Apache DataFusion] query engine. Comet is designed to significantly enhance
the
+performance of Apache Spark workloads while leveraging commodity hardware and
seamlessly integrating with the
+Spark ecosystem without requiring any code changes.
+
+[Apache DataFusion]: https://datafusion.apache.org
+
+The following diagram provides an overview of Comet's architecture.

@@ -34,26 +40,10 @@ Comet aims to support:
## Architecture
-The following diagram illustrates the architecture of Comet:
+The following diagram shows how Comet integrates with Apache Spark.

-## Supported Apache Spark versions
-
-Comet currently supports the following versions of Apache Spark:
-
-- 3.3.x
-- 3.4.x
-- 3.5.x
-
-Experimental support is provided for the following versions of Apache Spark
and is intended for development/testing
-use only and should not be used in production yet.
-
-- 4.0.0-preview1
-
-Note that Comet may not fully work with proprietary forks of Apache Spark such
as the Spark versions offered by
-Cloud Service Providers.
-
## Feature Parity with Apache Spark
The project strives to keep feature parity with Apache Spark, that is,
@@ -65,3 +55,9 @@ features and fallback to Spark engine.
To achieve this, besides unit tests within Comet itself, we also re-use
Spark SQL tests and make sure they all pass with Comet extension
enabled.
+
+## Getting Started
+
+Refer to the [Comet Installation Guide] to get started.
+
+[Comet Installation Guide]: installation.md
diff --git a/_sources/user-guide/source.md.txt
b/_sources/user-guide/source.md.txt
new file mode 100644
index 00000000..71c9060c
--- /dev/null
+++ b/_sources/user-guide/source.md.txt
@@ -0,0 +1,69 @@
+<!---
+ Licensed to the Apache Software Foundation (ASF) under one
+ or more contributor license agreements. See the NOTICE file
+ distributed with this work for additional information
+ regarding copyright ownership. The ASF licenses this file
+ to you under the Apache License, Version 2.0 (the
+ "License"); you may not use this file except in compliance
+ with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing,
+ software distributed under the License is distributed on an
+ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ KIND, either express or implied. See the License for the
+ specific language governing permissions and limitations
+ under the License.
+-->
+
+# Building Comet From Source
+
+It is sometimes preferable to build from source for a specific platform.
+
+## Using a Published Source Release
+
+Official source releases can be downloaded from
https://dist.apache.org/repos/dist/release/datafusion/
+
+```console
+# Pick the latest version
+export COMET_VERSION=0.3.0
+# Download the tarball
+curl -O
"https://dist.apache.org/repos/dist/release/datafusion/datafusion-comet-$COMET_VERSION/apache-datafusion-comet-$COMET_VERSION.tar.gz"
+# Unpack
+tar -xzf apache-datafusion-comet-$COMET_VERSION.tar.gz
+cd apache-datafusion-comet-$COMET_VERSION
+```
+
+Build
+
+```console
+make release-nogit PROFILES="-Pspark-3.4"
+```
+
+## Building from the GitHub repository
+
+Clone the repository:
+
+```console
+git clone https://github.com/apache/datafusion-comet.git
+```
+
+Build Comet for a specific Spark version:
+
+```console
+cd datafusion-comet
+make release PROFILES="-Pspark-3.4"
+```
+
+Note that the project builds for Scala 2.12 by default but can be built for
Scala 2.13 using an additional profile:
+
+```console
+make release PROFILES="-Pspark-3.4 -Pscala-2.13"
+```
+
+To build Comet from the source distribution on an isolated environment without
an access to `github.com` it is necessary to disable
`git-commit-id-maven-plugin`, otherwise you will face errors that there is no
access to the git during the build process. In that case you may use:
+
+```console
+make release-nogit PROFILES="-Pspark-3.4"
+```
diff --git a/_static/images/CometNativeExecution.drawio.png
b/_static/images/CometNativeExecution.drawio.png
deleted file mode 100644
index ba122a1f..00000000
Binary files a/_static/images/CometNativeExecution.drawio.png and /dev/null
differ
diff --git a/_static/images/CometNativeParquetReader.drawio
b/_static/images/CometNativeParquetReader.drawio
new file mode 100644
index 00000000..0c7304ef
--- /dev/null
+++ b/_static/images/CometNativeParquetReader.drawio
@@ -0,0 +1,100 @@
+<mxfile host="app.diagrams.net" agent="Mozilla/5.0 (Macintosh; Intel Mac OS X
10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.6 Safari/605.1.15"
version="24.7.16">
+ <diagram name="Page-1" id="IdYZ_KFENTEXElLiOEKC">
+ <mxGraphModel dx="1133" dy="729" grid="1" gridSize="10" guides="1"
tooltips="1" connect="1" arrows="1" fold="1" page="1" pageScale="1"
pageWidth="850" pageHeight="1100" math="0" shadow="0">
+ <root>
+ <mxCell id="0" />
+ <mxCell id="1" parent="0" />
+ <mxCell id="wVAZ-YzccNhZugPFJvmi-30" value="Spark Executor"
style="rounded=1;whiteSpace=wrap;html=1;dashed=1;verticalAlign=top;" vertex="1"
parent="1">
+ <mxGeometry x="10" y="40" width="510" height="430" as="geometry" />
+ </mxCell>
+ <mxCell id="AH3lBTSLKK5181iXBnnY-2" value="JVM Code"
style="rounded=1;whiteSpace=wrap;html=1;verticalAlign=top;" parent="1"
vertex="1">
+ <mxGeometry x="30" y="70" width="210" height="380" as="geometry" />
+ </mxCell>
+ <mxCell id="wVAZ-YzccNhZugPFJvmi-24"
style="edgeStyle=orthogonalEdgeStyle;rounded=0;orthogonalLoop=1;jettySize=auto;html=1;exitX=0.75;exitY=1;exitDx=0;exitDy=0;entryX=0.75;entryY=0;entryDx=0;entryDy=0;"
edge="1" parent="1" source="t5OBkkhKOG6cYtw1sPyQ-18"
target="wVAZ-YzccNhZugPFJvmi-13">
+ <mxGeometry relative="1" as="geometry" />
+ </mxCell>
+ <mxCell id="t5OBkkhKOG6cYtw1sPyQ-18" value="Comet Parquet
Reader<div><br></div><div><br></div><div>IO
and Decompression</div>"
style="rounded=1;whiteSpace=wrap;html=1;verticalAlign=top;" parent="1"
vertex="1">
+ <mxGeometry x="45" y="110" width="180" height="100" as="geometry" />
+ </mxCell>
+ <mxCell id="wVAZ-YzccNhZugPFJvmi-1" value="Native Code"
style="rounded=1;whiteSpace=wrap;html=1;verticalAlign=top;" vertex="1"
parent="1">
+ <mxGeometry x="290" y="70" width="210" height="380" as="geometry" />
+ </mxCell>
+ <mxCell id="wVAZ-YzccNhZugPFJvmi-21"
style="edgeStyle=orthogonalEdgeStyle;rounded=0;orthogonalLoop=1;jettySize=auto;html=1;exitX=0;exitY=0.75;exitDx=0;exitDy=0;entryX=1;entryY=0.75;entryDx=0;entryDy=0;"
edge="1" parent="1" source="wVAZ-YzccNhZugPFJvmi-2"
target="wVAZ-YzccNhZugPFJvmi-13">
+ <mxGeometry relative="1" as="geometry" />
+ </mxCell>
+ <mxCell id="wVAZ-YzccNhZugPFJvmi-2" value="Native Execution Plan"
style="rounded=1;whiteSpace=wrap;html=1;verticalAlign=top;" vertex="1"
parent="1">
+ <mxGeometry x="310" y="240" width="170" height="100" as="geometry" />
+ </mxCell>
+ <mxCell id="wVAZ-YzccNhZugPFJvmi-19"
style="edgeStyle=orthogonalEdgeStyle;rounded=0;orthogonalLoop=1;jettySize=auto;html=1;exitX=0;exitY=0.75;exitDx=0;exitDy=0;entryX=1;entryY=0.75;entryDx=0;entryDy=0;"
edge="1" parent="1" source="wVAZ-YzccNhZugPFJvmi-4"
target="t5OBkkhKOG6cYtw1sPyQ-18">
+ <mxGeometry relative="1" as="geometry" />
+ </mxCell>
+ <mxCell id="wVAZ-YzccNhZugPFJvmi-4" value="Parquet Decoding"
style="rounded=1;whiteSpace=wrap;html=1;verticalAlign=top;" vertex="1"
parent="1">
+ <mxGeometry x="305" y="110" width="180" height="100" as="geometry" />
+ </mxCell>
+ <mxCell id="wVAZ-YzccNhZugPFJvmi-6" value=""
style="shape=image;verticalLabelPosition=bottom;labelBackgroundColor=default;verticalAlign=top;aspect=fixed;imageAspect=0;image=data:image/svg+xml,PHN2ZyB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIGZpbGw9Im5vbmUiIHZpZXdCb3g9IjAgMCA4MDEgMTY4IiBoZWlnaHQ9IjE2OCIgd2lkdGg9IjgwMSI+JiN4YTs8ZyBjbGlwLXBhdGg9InVybCgjY2xpcDBfMV8xODEpIj4mI3hhOzxwYXRoIGZpbGw9InVybCgjcGFpbnQwX2xpbmVhcl8xXzE4MSkiIGQ9Ik03Ni4xMjk3IDE2OEM4OC40NTk3IDE2OCA5OS42MDk3IDE1
[...]
+ <mxGeometry x="323.48" y="273.6" width="143.03" height="30"
as="geometry" />
+ </mxCell>
+ <mxCell id="wVAZ-YzccNhZugPFJvmi-7" value=""
style="shape=image;verticalLabelPosition=bottom;labelBackgroundColor=default;verticalAlign=top;aspect=fixed;imageAspect=0;image=data:image/png,iVBORw0KGgoAAAANSUhEUgAABwgAAAOoCAMAAADyHlBJAAADAFBMVEUAAAABAQECAgIDAwMEBAQFBQUGBgYHBwcICAgJCQkKCgoLCwsMDAwNDQ0ODg4PDw8QEBARERESEhITExMUFBQVFRUWFhYXFxcYGBgZGRkaGhobGxscHBwdHR0eHh4fHx8gICAhISEiIiIjIyMkJCQlJSUmJiYnJycoKCgpKSkqKiorKyssLCwtLS0uLi4vLy8wMDAxMTEyMjIzMzM0NDQ1NTU2NjY3Nzc4ODg5OTk6Ojo7Ozs8
[...]
+ <mxGeometry x="360" y="303.6" width="70" height="36.4" as="geometry"
/>
+ </mxCell>
+ <mxCell id="wVAZ-YzccNhZugPFJvmi-10" value=""
style="shape=flexArrow;endArrow=classic;html=1;rounded=0;exitX=0.5;exitY=1;exitDx=0;exitDy=0;entryX=0.5;entryY=0;entryDx=0;entryDy=0;"
edge="1" parent="1">
+ <mxGeometry width="50" height="50" relative="1" as="geometry">
+ <mxPoint x="394.5" y="340" as="sourcePoint" />
+ <mxPoint x="394.5" y="370" as="targetPoint" />
+ </mxGeometry>
+ </mxCell>
+ <mxCell id="wVAZ-YzccNhZugPFJvmi-11" value="Shuffle Files"
style="shape=process;whiteSpace=wrap;html=1;backgroundOutline=1;fillColor=#f5f5f5;fontColor=#333333;strokeColor=#666666;"
vertex="1" parent="1">
+ <mxGeometry x="310" y="370" width="170" height="50" as="geometry" />
+ </mxCell>
+ <mxCell id="wVAZ-YzccNhZugPFJvmi-20"
style="edgeStyle=orthogonalEdgeStyle;rounded=0;orthogonalLoop=1;jettySize=auto;html=1;exitX=1;exitY=0.25;exitDx=0;exitDy=0;entryX=0;entryY=0.25;entryDx=0;entryDy=0;"
edge="1" parent="1" source="wVAZ-YzccNhZugPFJvmi-13"
target="wVAZ-YzccNhZugPFJvmi-2">
+ <mxGeometry relative="1" as="geometry" />
+ </mxCell>
+ <mxCell id="wVAZ-YzccNhZugPFJvmi-28" value="executePlan()"
style="edgeLabel;html=1;align=center;verticalAlign=middle;resizable=0;points=[];"
vertex="1" connectable="0" parent="wVAZ-YzccNhZugPFJvmi-20">
+ <mxGeometry x="-0.1059" y="2" relative="1" as="geometry">
+ <mxPoint y="11" as="offset" />
+ </mxGeometry>
+ </mxCell>
+ <mxCell id="wVAZ-YzccNhZugPFJvmi-23"
style="edgeStyle=orthogonalEdgeStyle;rounded=0;orthogonalLoop=1;jettySize=auto;html=1;exitX=0.25;exitY=0;exitDx=0;exitDy=0;entryX=0.25;entryY=1;entryDx=0;entryDy=0;"
edge="1" parent="1" source="wVAZ-YzccNhZugPFJvmi-13"
target="t5OBkkhKOG6cYtw1sPyQ-18">
+ <mxGeometry relative="1" as="geometry" />
+ </mxCell>
+ <mxCell id="wVAZ-YzccNhZugPFJvmi-25"
style="edgeStyle=orthogonalEdgeStyle;rounded=0;orthogonalLoop=1;jettySize=auto;html=1;exitX=0.75;exitY=1;exitDx=0;exitDy=0;entryX=0.75;entryY=0;entryDx=0;entryDy=0;"
edge="1" parent="1" source="wVAZ-YzccNhZugPFJvmi-13"
target="wVAZ-YzccNhZugPFJvmi-14">
+ <mxGeometry relative="1" as="geometry" />
+ </mxCell>
+ <mxCell id="wVAZ-YzccNhZugPFJvmi-13" value="CometExecIterator"
style="rounded=1;whiteSpace=wrap;html=1;verticalAlign=middle;" vertex="1"
parent="1">
+ <mxGeometry x="45" y="240" width="180" height="100" as="geometry" />
+ </mxCell>
+ <mxCell id="wVAZ-YzccNhZugPFJvmi-22"
style="edgeStyle=orthogonalEdgeStyle;rounded=0;orthogonalLoop=1;jettySize=auto;html=1;exitX=0.25;exitY=0;exitDx=0;exitDy=0;entryX=0.25;entryY=1;entryDx=0;entryDy=0;"
edge="1" parent="1" source="wVAZ-YzccNhZugPFJvmi-14"
target="wVAZ-YzccNhZugPFJvmi-13">
+ <mxGeometry relative="1" as="geometry" />
+ </mxCell>
+ <mxCell id="wVAZ-YzccNhZugPFJvmi-26" value="next()"
style="edgeLabel;html=1;align=center;verticalAlign=middle;resizable=0;points=[];"
vertex="1" connectable="0" parent="wVAZ-YzccNhZugPFJvmi-22">
+ <mxGeometry x="0.0667" y="1" relative="1" as="geometry">
+ <mxPoint x="21" as="offset" />
+ </mxGeometry>
+ </mxCell>
+ <mxCell id="wVAZ-YzccNhZugPFJvmi-14" value="Spark Execution Logic"
style="rounded=1;whiteSpace=wrap;html=1;verticalAlign=middle;" vertex="1"
parent="1">
+ <mxGeometry x="45" y="370" width="180" height="40" as="geometry" />
+ </mxCell>
+ <mxCell id="wVAZ-YzccNhZugPFJvmi-15" value=""
style="shape=image;verticalLabelPosition=bottom;labelBackgroundColor=default;verticalAlign=top;aspect=fixed;imageAspect=0;image=data:image/png,iVBORw0KGgoAAAANSUhEUgAABwgAAAOoCAMAAADyHlBJAAADAFBMVEUAAAABAQECAgIDAwMEBAQFBQUGBgYHBwcICAgJCQkKCgoLCwsMDAwNDQ0ODg4PDw8QEBARERESEhITExMUFBQVFRUWFhYXFxcYGBgZGRkaGhobGxscHBwdHR0eHh4fHx8gICAhISEiIiIjIyMkJCQlJSUmJiYnJycoKCgpKSkqKiorKyssLCwtLS0uLi4vLy8wMDAxMTEyMjIzMzM0NDQ1NTU2NjY3Nzc4ODg5OTk6Ojo7Ozs
[...]
+ <mxGeometry x="360" y="173.60000000000002" width="70" height="36.4"
as="geometry" />
+ </mxCell>
+ <mxCell id="wVAZ-YzccNhZugPFJvmi-16" value=""
style="shape=flexArrow;endArrow=classic;html=1;rounded=0;exitX=0.5;exitY=1;exitDx=0;exitDy=0;entryX=0.5;entryY=0;entryDx=0;entryDy=0;"
edge="1" parent="1">
+ <mxGeometry width="50" height="50" relative="1" as="geometry">
+ <mxPoint x="394.5" y="210" as="sourcePoint" />
+ <mxPoint x="394.5" y="240" as="targetPoint" />
+ </mxGeometry>
+ </mxCell>
+ <mxCell id="wVAZ-YzccNhZugPFJvmi-18"
style="edgeStyle=orthogonalEdgeStyle;rounded=0;orthogonalLoop=1;jettySize=auto;html=1;exitX=1;exitY=0.25;exitDx=0;exitDy=0;entryX=0;entryY=0.25;entryDx=0;entryDy=0;"
edge="1" parent="1" source="t5OBkkhKOG6cYtw1sPyQ-18"
target="wVAZ-YzccNhZugPFJvmi-4">
+ <mxGeometry relative="1" as="geometry" />
+ </mxCell>
+ <mxCell id="wVAZ-YzccNhZugPFJvmi-29" value="decode()"
style="edgeLabel;html=1;align=center;verticalAlign=middle;resizable=0;points=[];"
vertex="1" connectable="0" parent="wVAZ-YzccNhZugPFJvmi-18">
+ <mxGeometry x="-0.025" y="-3" relative="1" as="geometry">
+ <mxPoint y="12" as="offset" />
+ </mxGeometry>
+ </mxCell>
+ <mxCell id="wVAZ-YzccNhZugPFJvmi-27" value="next()"
style="edgeLabel;html=1;align=center;verticalAlign=middle;resizable=0;points=[];"
vertex="1" connectable="0" parent="1">
+ <mxGeometry x="110" y="220" as="geometry" />
+ </mxCell>
+ </root>
+ </mxGraphModel>
+ </diagram>
+</mxfile>
diff --git a/_static/images/CometNativeParquetReader.drawio.svg
b/_static/images/CometNativeParquetReader.drawio.svg
new file mode 100644
index 00000000..0c1f93c7
--- /dev/null
+++ b/_static/images/CometNativeParquetReader.drawio.svg
@@ -0,0 +1,4 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!-- Do not edit this file with editors other than draw.io -->
+<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN"
"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
+<svg xmlns="http://www.w3.org/2000/svg"
xmlns:xlink="http://www.w3.org/1999/xlink" version="1.1" width="511px"
height="431px" viewBox="-0.5 -0.5 511 431" content="<mxfile
host="app.diagrams.net" agent="Mozilla/5.0 (Macintosh; Intel Mac
OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.6
Safari/605.1.15" version="24.7.16" scale="1"
border="0"> <diagram name="Page-1"
id="IdYZ_KFENTEXElLiOEKC&quo [...]
\ No newline at end of file
diff --git a/_static/images/CometNativeParquetScan.drawio.png
b/_static/images/CometNativeParquetScan.drawio.png
deleted file mode 100644
index 712cbae4..00000000
Binary files a/_static/images/CometNativeParquetScan.drawio.png and /dev/null
differ
diff --git a/_static/images/CometOverviewDetailed.drawio
b/_static/images/CometOverviewDetailed.drawio
new file mode 100644
index 00000000..ff7f4c59
--- /dev/null
+++ b/_static/images/CometOverviewDetailed.drawio
@@ -0,0 +1,94 @@
+<mxfile host="app.diagrams.net" agent="Mozilla/5.0 (Macintosh; Intel Mac OS X
10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.6 Safari/605.1.15"
version="24.7.16">
+ <diagram name="Page-1" id="IdYZ_KFENTEXElLiOEKC">
+ <mxGraphModel dx="1193" dy="827" grid="1" gridSize="10" guides="1"
tooltips="1" connect="1" arrows="1" fold="1" page="1" pageScale="1"
pageWidth="850" pageHeight="1100" math="0" shadow="0">
+ <root>
+ <mxCell id="0" />
+ <mxCell id="1" parent="0" />
+ <mxCell id="AH3lBTSLKK5181iXBnnY-2" value="Spark Executor"
style="rounded=1;whiteSpace=wrap;html=1;verticalAlign=top;" parent="1"
vertex="1">
+ <mxGeometry x="290" width="210" height="430" as="geometry" />
+ </mxCell>
+ <mxCell id="AH3lBTSLKK5181iXBnnY-16" value="Spark Driver"
style="rounded=1;whiteSpace=wrap;html=1;verticalAlign=top;" parent="1"
vertex="1">
+ <mxGeometry y="40" width="200" height="350" as="geometry" />
+ </mxCell>
+ <mxCell id="AH3lBTSLKK5181iXBnnY-17" value=""
style="shape=image;verticalLabelPosition=bottom;labelBackgroundColor=default;verticalAlign=top;aspect=fixed;imageAspect=0;image=data:image/svg+xml,PHN2ZyB4bWxuczp4bGluaz0iaHR0cDovL3d3dy53My5vcmcvMTk5OS94bGluayIgeG1sbnM9Imh0dHA6Ly93d3cudzMub3JnLzIwMDAvc3ZnIiBzdHlsZT0iZmlsbC1ydWxlOmV2ZW5vZGQ7Y2xpcC1ydWxlOmV2ZW5vZGQ7c3Ryb2tlLWxpbmVqb2luOnJvdW5kO3N0cm9rZS1taXRlcmxpbWl0OjI7IiB4bWw6c3BhY2U9InByZXNlcnZlIiB2ZXJzaW9uPSIxLjEiIHZpZXdCb3g9IjAgMCA
[...]
+ <mxGeometry x="34.519999999999996" y="200" width="125.48"
height="30" as="geometry" />
+ </mxCell>
+ <mxCell id="t5OBkkhKOG6cYtw1sPyQ-1" value="Spark Logical Plan"
style="rounded=1;whiteSpace=wrap;html=1;" vertex="1" parent="1">
+ <mxGeometry x="10" y="80" width="180" height="30" as="geometry" />
+ </mxCell>
+ <mxCell id="t5OBkkhKOG6cYtw1sPyQ-2" value="Spark Physical Plan"
style="rounded=1;whiteSpace=wrap;html=1;" vertex="1" parent="1">
+ <mxGeometry x="10" y="140" width="180" height="30" as="geometry" />
+ </mxCell>
+ <mxCell id="t5OBkkhKOG6cYtw1sPyQ-3" value="Comet Physical Plan"
style="rounded=1;whiteSpace=wrap;html=1;verticalAlign=top;" vertex="1"
parent="1">
+ <mxGeometry x="10" y="260" width="180" height="100" as="geometry" />
+ </mxCell>
+ <mxCell id="t5OBkkhKOG6cYtw1sPyQ-4" value="protobuf intermediate
representation"
style="shape=process;whiteSpace=wrap;html=1;backgroundOutline=1;fillColor=#f5f5f5;fontColor=#333333;strokeColor=#666666;"
vertex="1" parent="1">
+ <mxGeometry x="40" y="290" width="120" height="50" as="geometry" />
+ </mxCell>
+ <mxCell id="t5OBkkhKOG6cYtw1sPyQ-12" value=""
style="shape=flexArrow;endArrow=classic;html=1;rounded=0;exitX=0.5;exitY=1;exitDx=0;exitDy=0;entryX=0.5;entryY=0;entryDx=0;entryDy=0;"
edge="1" parent="1" source="t5OBkkhKOG6cYtw1sPyQ-1"
target="t5OBkkhKOG6cYtw1sPyQ-2">
+ <mxGeometry width="50" height="50" relative="1" as="geometry">
+ <mxPoint x="270" y="270" as="sourcePoint" />
+ <mxPoint x="320" y="220" as="targetPoint" />
+ </mxGeometry>
+ </mxCell>
+ <mxCell id="t5OBkkhKOG6cYtw1sPyQ-13" value=""
style="shape=flexArrow;endArrow=classic;html=1;rounded=0;exitX=0.5;exitY=1;exitDx=0;exitDy=0;entryX=0.5;entryY=0;entryDx=0;entryDy=0;"
edge="1" parent="1">
+ <mxGeometry width="50" height="50" relative="1" as="geometry">
+ <mxPoint x="96.75999999999999" y="170" as="sourcePoint" />
+ <mxPoint x="96.75999999999999" y="200" as="targetPoint" />
+ </mxGeometry>
+ </mxCell>
+ <mxCell id="t5OBkkhKOG6cYtw1sPyQ-14" value=""
style="shape=flexArrow;endArrow=classic;html=1;rounded=0;exitX=0.5;exitY=1;exitDx=0;exitDy=0;entryX=0.5;entryY=0;entryDx=0;entryDy=0;"
edge="1" parent="1">
+ <mxGeometry width="50" height="50" relative="1" as="geometry">
+ <mxPoint x="96.75999999999999" y="230" as="sourcePoint" />
+ <mxPoint x="96.75999999999999" y="260" as="targetPoint" />
+ </mxGeometry>
+ </mxCell>
+ <mxCell id="t5OBkkhKOG6cYtw1sPyQ-15" value=""
style="shape=flexArrow;endArrow=classic;html=1;rounded=0;endWidth=28;endSize=9.67;width=11;fillColor=#000000;"
edge="1" parent="1">
+ <mxGeometry width="50" height="50" relative="1" as="geometry">
+ <mxPoint x="200" y="204.5" as="sourcePoint" />
+ <mxPoint x="290" y="204.5" as="targetPoint" />
+ </mxGeometry>
+ </mxCell>
+ <mxCell id="t5OBkkhKOG6cYtw1sPyQ-16" value="Native Execution Plan"
style="rounded=1;whiteSpace=wrap;html=1;verticalAlign=top;" vertex="1"
parent="1">
+ <mxGeometry x="310" y="230" width="170" height="100" as="geometry" />
+ </mxCell>
+ <mxCell id="t5OBkkhKOG6cYtw1sPyQ-17" value=""
style="shape=image;verticalLabelPosition=bottom;labelBackgroundColor=default;verticalAlign=top;aspect=fixed;imageAspect=0;image=data:image/svg+xml,PHN2ZyB4bWxuczp4bGluaz0iaHR0cDovL3d3dy53My5vcmcvMTk5OS94bGluayIgeG1sbnM9Imh0dHA6Ly93d3cudzMub3JnLzIwMDAvc3ZnIiBzdHlsZT0iZmlsbC1ydWxlOmV2ZW5vZGQ7Y2xpcC1ydWxlOmV2ZW5vZGQ7c3Ryb2tlLWxpbmVqb2luOnJvdW5kO3N0cm9rZS1taXRlcmxpbWl0OjI7IiB4bWw6c3BhY2U9InByZXNlcnZlIiB2ZXJzaW9uPSIxLjEiIHZpZXdCb3g9IjAgMCA
[...]
+ <mxGeometry x="332.26" y="170" width="125.48" height="30"
as="geometry" />
+ </mxCell>
+ <mxCell id="t5OBkkhKOG6cYtw1sPyQ-18" value="Comet Physical Plan"
style="rounded=1;whiteSpace=wrap;html=1;verticalAlign=top;" vertex="1"
parent="1">
+ <mxGeometry x="305" y="40" width="180" height="100" as="geometry" />
+ </mxCell>
+ <mxCell id="t5OBkkhKOG6cYtw1sPyQ-19" value="protobuf intermediate
representation"
style="shape=process;whiteSpace=wrap;html=1;backgroundOutline=1;fillColor=#f5f5f5;fontColor=#333333;strokeColor=#666666;"
vertex="1" parent="1">
+ <mxGeometry x="335" y="70" width="120" height="50" as="geometry" />
+ </mxCell>
+ <mxCell id="t5OBkkhKOG6cYtw1sPyQ-20" value=""
style="shape=image;verticalLabelPosition=bottom;labelBackgroundColor=default;verticalAlign=top;aspect=fixed;imageAspect=0;image=data:image/svg+xml,PHN2ZyB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIGZpbGw9Im5vbmUiIHZpZXdCb3g9IjAgMCA4MDEgMTY4IiBoZWlnaHQ9IjE2OCIgd2lkdGg9IjgwMSI+JiN4YTs8ZyBjbGlwLXBhdGg9InVybCgjY2xpcDBfMV8xODEpIj4mI3hhOzxwYXRoIGZpbGw9InVybCgjcGFpbnQwX2xpbmVhcl8xXzE4MSkiIGQ9Ik03Ni4xMjk3IDE2OEM4OC40NTk3IDE2OCA5OS42MDk3IDE
[...]
+ <mxGeometry x="323.48" y="263.6" width="143.03" height="30"
as="geometry" />
+ </mxCell>
+ <mxCell id="t5OBkkhKOG6cYtw1sPyQ-21" value=""
style="shape=image;verticalLabelPosition=bottom;labelBackgroundColor=default;verticalAlign=top;aspect=fixed;imageAspect=0;image=data:image/png,iVBORw0KGgoAAAANSUhEUgAABwgAAAOoCAMAAADyHlBJAAADAFBMVEUAAAABAQECAgIDAwMEBAQFBQUGBgYHBwcICAgJCQkKCgoLCwsMDAwNDQ0ODg4PDw8QEBARERESEhITExMUFBQVFRUWFhYXFxcYGBgZGRkaGhobGxscHBwdHR0eHh4fHx8gICAhISEiIiIjIyMkJCQlJSUmJiYnJycoKCgpKSkqKiorKyssLCwtLS0uLi4vLy8wMDAxMTEyMjIzMzM0NDQ1NTU2NjY3Nzc4ODg5OTk6Ojo7Ozs
[...]
+ <mxGeometry x="360" y="293.6" width="70" height="36.4" as="geometry"
/>
+ </mxCell>
+ <mxCell id="t5OBkkhKOG6cYtw1sPyQ-22" value=""
style="shape=flexArrow;endArrow=classic;html=1;rounded=0;exitX=0.5;exitY=1;exitDx=0;exitDy=0;entryX=0.5;entryY=0;entryDx=0;entryDy=0;"
edge="1" parent="1">
+ <mxGeometry width="50" height="50" relative="1" as="geometry">
+ <mxPoint x="394.5" y="140" as="sourcePoint" />
+ <mxPoint x="394.5" y="170" as="targetPoint" />
+ </mxGeometry>
+ </mxCell>
+ <mxCell id="t5OBkkhKOG6cYtw1sPyQ-23" value=""
style="shape=flexArrow;endArrow=classic;html=1;rounded=0;exitX=0.5;exitY=1;exitDx=0;exitDy=0;entryX=0.5;entryY=0;entryDx=0;entryDy=0;"
edge="1" parent="1" source="t5OBkkhKOG6cYtw1sPyQ-17"
target="t5OBkkhKOG6cYtw1sPyQ-16">
+ <mxGeometry width="50" height="50" relative="1" as="geometry">
+ <mxPoint x="140" y="210" as="sourcePoint" />
+ <mxPoint x="140" y="240" as="targetPoint" />
+ </mxGeometry>
+ </mxCell>
+ <mxCell id="t5OBkkhKOG6cYtw1sPyQ-24" value=""
style="shape=flexArrow;endArrow=classic;html=1;rounded=0;exitX=0.5;exitY=1;exitDx=0;exitDy=0;entryX=0.5;entryY=0;entryDx=0;entryDy=0;"
edge="1" parent="1">
+ <mxGeometry width="50" height="50" relative="1" as="geometry">
+ <mxPoint x="394.5" y="330" as="sourcePoint" />
+ <mxPoint x="394.5" y="360" as="targetPoint" />
+ </mxGeometry>
+ </mxCell>
+ <mxCell id="t5OBkkhKOG6cYtw1sPyQ-25" value="Shuffle Files"
style="shape=process;whiteSpace=wrap;html=1;backgroundOutline=1;fillColor=#f5f5f5;fontColor=#333333;strokeColor=#666666;"
vertex="1" parent="1">
+ <mxGeometry x="310" y="360" width="170" height="50" as="geometry" />
+ </mxCell>
+ </root>
+ </mxGraphModel>
+ </diagram>
+</mxfile>
diff --git a/_static/images/CometOverviewDetailed.drawio.svg
b/_static/images/CometOverviewDetailed.drawio.svg
new file mode 100644
index 00000000..0f29083b
--- /dev/null
+++ b/_static/images/CometOverviewDetailed.drawio.svg
@@ -0,0 +1,4 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!-- Do not edit this file with editors other than draw.io -->
+<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN"
"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
+<svg xmlns="http://www.w3.org/2000/svg"
xmlns:xlink="http://www.w3.org/1999/xlink" version="1.1" width="501px"
height="431px" viewBox="-0.5 -0.5 501 431" content="<mxfile
host="app.diagrams.net" agent="Mozilla/5.0 (Macintosh; Intel Mac
OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.6
Safari/605.1.15" version="24.7.16" scale="1"
border="0"> <diagram name="Page-1"
id="IdYZ_KFENTEXElLiOEKC&quo [...]
\ No newline at end of file
diff --git a/contributor-guide/adding_a_new_expression.html
b/contributor-guide/adding_a_new_expression.html
index ca4d5ab4..310ffd28 100644
--- a/contributor-guide/adding_a_new_expression.html
+++ b/contributor-guide/adding_a_new_expression.html
@@ -118,6 +118,16 @@ under the License.
Installing Comet
</a>
</li>
+ <li class="toctree-l1">
+ <a class="reference internal" href="../user-guide/source.html">
+ Building From Source
+ </a>
+ </li>
+ <li class="toctree-l1">
+ <a class="reference internal" href="../user-guide/kubernetes.html">
+ Kubernetes Guide
+ </a>
+ </li>
<li class="toctree-l1">
<a class="reference internal" href="../user-guide/datasources.html">
Supported Data Sources
diff --git a/contributor-guide/benchmark-results/tpc-ds.html
b/contributor-guide/benchmark-results/tpc-ds.html
index 964db271..9bb2fe9f 100644
--- a/contributor-guide/benchmark-results/tpc-ds.html
+++ b/contributor-guide/benchmark-results/tpc-ds.html
@@ -116,6 +116,16 @@ under the License.
Installing Comet
</a>
</li>
+ <li class="toctree-l1">
+ <a class="reference internal" href="../../user-guide/source.html">
+ Building From Source
+ </a>
+ </li>
+ <li class="toctree-l1">
+ <a class="reference internal" href="../../user-guide/kubernetes.html">
+ Kubernetes Guide
+ </a>
+ </li>
<li class="toctree-l1">
<a class="reference internal" href="../../user-guide/datasources.html">
Supported Data Sources
diff --git a/contributor-guide/benchmark-results/tpc-h.html
b/contributor-guide/benchmark-results/tpc-h.html
index 6b603bce..1141ecbf 100644
--- a/contributor-guide/benchmark-results/tpc-h.html
+++ b/contributor-guide/benchmark-results/tpc-h.html
@@ -116,6 +116,16 @@ under the License.
Installing Comet
</a>
</li>
+ <li class="toctree-l1">
+ <a class="reference internal" href="../../user-guide/source.html">
+ Building From Source
+ </a>
+ </li>
+ <li class="toctree-l1">
+ <a class="reference internal" href="../../user-guide/kubernetes.html">
+ Kubernetes Guide
+ </a>
+ </li>
<li class="toctree-l1">
<a class="reference internal" href="../../user-guide/datasources.html">
Supported Data Sources
diff --git a/contributor-guide/benchmarking.html
b/contributor-guide/benchmarking.html
index 76cc967f..4f2ab05b 100644
--- a/contributor-guide/benchmarking.html
+++ b/contributor-guide/benchmarking.html
@@ -118,6 +118,16 @@ under the License.
Installing Comet
</a>
</li>
+ <li class="toctree-l1">
+ <a class="reference internal" href="../user-guide/source.html">
+ Building From Source
+ </a>
+ </li>
+ <li class="toctree-l1">
+ <a class="reference internal" href="../user-guide/kubernetes.html">
+ Kubernetes Guide
+ </a>
+ </li>
<li class="toctree-l1">
<a class="reference internal" href="../user-guide/datasources.html">
Supported Data Sources
diff --git a/contributor-guide/contributing.html
b/contributor-guide/contributing.html
index b53f1435..35235beb 100644
--- a/contributor-guide/contributing.html
+++ b/contributor-guide/contributing.html
@@ -118,6 +118,16 @@ under the License.
Installing Comet
</a>
</li>
+ <li class="toctree-l1">
+ <a class="reference internal" href="../user-guide/source.html">
+ Building From Source
+ </a>
+ </li>
+ <li class="toctree-l1">
+ <a class="reference internal" href="../user-guide/kubernetes.html">
+ Kubernetes Guide
+ </a>
+ </li>
<li class="toctree-l1">
<a class="reference internal" href="../user-guide/datasources.html">
Supported Data Sources
diff --git a/contributor-guide/debugging.html b/contributor-guide/debugging.html
index ebd00f00..bea3fdde 100644
--- a/contributor-guide/debugging.html
+++ b/contributor-guide/debugging.html
@@ -118,6 +118,16 @@ under the License.
Installing Comet
</a>
</li>
+ <li class="toctree-l1">
+ <a class="reference internal" href="../user-guide/source.html">
+ Building From Source
+ </a>
+ </li>
+ <li class="toctree-l1">
+ <a class="reference internal" href="../user-guide/kubernetes.html">
+ Kubernetes Guide
+ </a>
+ </li>
<li class="toctree-l1">
<a class="reference internal" href="../user-guide/datasources.html">
Supported Data Sources
diff --git a/contributor-guide/development.html
b/contributor-guide/development.html
index db06f6de..61152a6e 100644
--- a/contributor-guide/development.html
+++ b/contributor-guide/development.html
@@ -118,6 +118,16 @@ under the License.
Installing Comet
</a>
</li>
+ <li class="toctree-l1">
+ <a class="reference internal" href="../user-guide/source.html">
+ Building From Source
+ </a>
+ </li>
+ <li class="toctree-l1">
+ <a class="reference internal" href="../user-guide/kubernetes.html">
+ Kubernetes Guide
+ </a>
+ </li>
<li class="toctree-l1">
<a class="reference internal" href="../user-guide/datasources.html">
Supported Data Sources
diff --git a/contributor-guide/plugin_overview.html
b/contributor-guide/plugin_overview.html
index 3336c8d9..b1e50ded 100644
--- a/contributor-guide/plugin_overview.html
+++ b/contributor-guide/plugin_overview.html
@@ -118,6 +118,16 @@ under the License.
Installing Comet
</a>
</li>
+ <li class="toctree-l1">
+ <a class="reference internal" href="../user-guide/source.html">
+ Building From Source
+ </a>
+ </li>
+ <li class="toctree-l1">
+ <a class="reference internal" href="../user-guide/kubernetes.html">
+ Kubernetes Guide
+ </a>
+ </li>
<li class="toctree-l1">
<a class="reference internal" href="../user-guide/datasources.html">
Supported Data Sources
@@ -393,12 +403,12 @@ until no more batches are available (meaning that all
data has been processed by
<p>The leaf nodes in the physical plan are always <code class="docutils
literal notranslate"><span class="pre">ScanExec</span></code> and these
operators consume batches of Arrow data that were
prepared before the plan is executed. When <code class="docutils literal
notranslate"><span class="pre">CometExecIterator</span></code> invokes <code
class="docutils literal notranslate"><span
class="pre">Native.executePlan</span></code> it passes the memory
addresses of these Arrow arrays to the native code.</p>
-<p><img alt="Diagram of Comet Native Execution"
src="../_static/images/CometNativeExecution.drawio.png" /></p>
+<p><img alt="Diagram of Comet Native Execution"
src="../_static/images/CometOverviewDetailed.drawio.svg" /></p>
</section>
<section id="end-to-end-flow">
<h2>End to End Flow<a class="headerlink" href="#end-to-end-flow" title="Link
to this heading">¶</a></h2>
<p>The following diagram shows the end-to-end flow.</p>
-<p><img alt="Diagram of Comet Native Parquet Scan"
src="../_static/images/CometNativeParquetScan.drawio.png" /></p>
+<p><img alt="Diagram of Comet Native Parquet Scan"
src="../_static/images/CometNativeParquetReader.drawio.svg" /></p>
</section>
</section>
diff --git a/contributor-guide/profiling_native_code.html
b/contributor-guide/profiling_native_code.html
index 482b6772..76bd7caa 100644
--- a/contributor-guide/profiling_native_code.html
+++ b/contributor-guide/profiling_native_code.html
@@ -118,6 +118,16 @@ under the License.
Installing Comet
</a>
</li>
+ <li class="toctree-l1">
+ <a class="reference internal" href="../user-guide/source.html">
+ Building From Source
+ </a>
+ </li>
+ <li class="toctree-l1">
+ <a class="reference internal" href="../user-guide/kubernetes.html">
+ Kubernetes Guide
+ </a>
+ </li>
<li class="toctree-l1">
<a class="reference internal" href="../user-guide/datasources.html">
Supported Data Sources
diff --git a/contributor-guide/spark-sql-tests.html
b/contributor-guide/spark-sql-tests.html
index 9f7294b1..a01ae469 100644
--- a/contributor-guide/spark-sql-tests.html
+++ b/contributor-guide/spark-sql-tests.html
@@ -117,6 +117,16 @@ under the License.
Installing Comet
</a>
</li>
+ <li class="toctree-l1">
+ <a class="reference internal" href="../user-guide/source.html">
+ Building From Source
+ </a>
+ </li>
+ <li class="toctree-l1">
+ <a class="reference internal" href="../user-guide/kubernetes.html">
+ Kubernetes Guide
+ </a>
+ </li>
<li class="toctree-l1">
<a class="reference internal" href="../user-guide/datasources.html">
Supported Data Sources
diff --git a/genindex.html b/genindex.html
index 03c18b7b..65fdeaa7 100644
--- a/genindex.html
+++ b/genindex.html
@@ -115,6 +115,16 @@ under the License.
Installing Comet
</a>
</li>
+ <li class="toctree-l1">
+ <a class="reference internal" href="user-guide/source.html">
+ Building From Source
+ </a>
+ </li>
+ <li class="toctree-l1">
+ <a class="reference internal" href="user-guide/kubernetes.html">
+ Kubernetes Guide
+ </a>
+ </li>
<li class="toctree-l1">
<a class="reference internal" href="user-guide/datasources.html">
Supported Data Sources
diff --git a/index.html b/index.html
index 52959e7e..12cf0223 100644
--- a/index.html
+++ b/index.html
@@ -117,6 +117,16 @@ under the License.
Installing Comet
</a>
</li>
+ <li class="toctree-l1">
+ <a class="reference internal" href="user-guide/source.html">
+ Building From Source
+ </a>
+ </li>
+ <li class="toctree-l1">
+ <a class="reference internal" href="user-guide/kubernetes.html">
+ Kubernetes Guide
+ </a>
+ </li>
<li class="toctree-l1">
<a class="reference internal" href="user-guide/datasources.html">
Supported Data Sources
@@ -308,6 +318,8 @@ as a native runtime to achieve improvement in terms of
query efficiency and quer
<ul>
<li class="toctree-l1"><a class="reference internal"
href="user-guide/overview.html">Comet Overview</a></li>
<li class="toctree-l1"><a class="reference internal"
href="user-guide/installation.html">Installing Comet</a></li>
+<li class="toctree-l1"><a class="reference internal"
href="user-guide/source.html">Building From Source</a></li>
+<li class="toctree-l1"><a class="reference internal"
href="user-guide/kubernetes.html">Kubernetes Guide</a></li>
<li class="toctree-l1"><a class="reference internal"
href="user-guide/datasources.html">Supported Data Sources</a></li>
<li class="toctree-l1"><a class="reference internal"
href="user-guide/datatypes.html">Supported Data Types</a></li>
<li class="toctree-l1"><a class="reference internal"
href="user-guide/operators.html">Supported Operators</a></li>
diff --git a/objects.inv b/objects.inv
index 013792d3..49c0080e 100644
Binary files a/objects.inv and b/objects.inv differ
diff --git a/search.html b/search.html
index 1d92f3b0..a5c37aa8 100644
--- a/search.html
+++ b/search.html
@@ -122,6 +122,16 @@ under the License.
Installing Comet
</a>
</li>
+ <li class="toctree-l1">
+ <a class="reference internal" href="user-guide/source.html">
+ Building From Source
+ </a>
+ </li>
+ <li class="toctree-l1">
+ <a class="reference internal" href="user-guide/kubernetes.html">
+ Kubernetes Guide
+ </a>
+ </li>
<li class="toctree-l1">
<a class="reference internal" href="user-guide/datasources.html">
Supported Data Sources
diff --git a/searchindex.js b/searchindex.js
index a69b5051..1e0fb6c4 100644
--- a/searchindex.js
+++ b/searchindex.js
@@ -1 +1 @@
-Search.setIndex({"alltitles": {"1. Install Comet": [[9, "install-comet"]], "2.
Clone Spark and Apply Diff": [[9, "clone-spark-and-apply-diff"]], "3. Run Spark
SQL Tests": [[9, "run-spark-sql-tests"]], "ANSI mode": [[11, "ansi-mode"]],
"API Differences Between Spark Versions": [[0,
"api-differences-between-spark-versions"]], "ASF Links": [[10, null]], "Adding
Spark-side Tests for the New Expression": [[0,
"adding-spark-side-tests-for-the-new-expression"]], "Adding a New Expression":
[[0, [...]
\ No newline at end of file
+Search.setIndex({"alltitles": {"1. Install Comet": [[9, "install-comet"]], "2.
Clone Spark and Apply Diff": [[9, "clone-spark-and-apply-diff"]], "3. Run Spark
SQL Tests": [[9, "run-spark-sql-tests"]], "ANSI mode": [[11, "ansi-mode"]],
"API Differences Between Spark Versions": [[0,
"api-differences-between-spark-versions"]], "ASF Links": [[10, null]], "Adding
Spark-side Tests for the New Expression": [[0,
"adding-spark-side-tests-for-the-new-expression"]], "Adding a New Expression":
[[0, [...]
\ No newline at end of file
diff --git a/user-guide/compatibility.html b/user-guide/compatibility.html
index 4a0578fc..ef68d276 100644
--- a/user-guide/compatibility.html
+++ b/user-guide/compatibility.html
@@ -118,6 +118,16 @@ under the License.
Installing Comet
</a>
</li>
+ <li class="toctree-l1">
+ <a class="reference internal" href="source.html">
+ Building From Source
+ </a>
+ </li>
+ <li class="toctree-l1">
+ <a class="reference internal" href="kubernetes.html">
+ Kubernetes Guide
+ </a>
+ </li>
<li class="toctree-l1">
<a class="reference internal" href="datasources.html">
Supported Data Sources
diff --git a/user-guide/configs.html b/user-guide/configs.html
index 64decce4..88270bdc 100644
--- a/user-guide/configs.html
+++ b/user-guide/configs.html
@@ -118,6 +118,16 @@ under the License.
Installing Comet
</a>
</li>
+ <li class="toctree-l1">
+ <a class="reference internal" href="source.html">
+ Building From Source
+ </a>
+ </li>
+ <li class="toctree-l1">
+ <a class="reference internal" href="kubernetes.html">
+ Kubernetes Guide
+ </a>
+ </li>
<li class="toctree-l1">
<a class="reference internal" href="datasources.html">
Supported Data Sources
diff --git a/user-guide/datasources.html b/user-guide/datasources.html
index 2d459507..46aa8e23 100644
--- a/user-guide/datasources.html
+++ b/user-guide/datasources.html
@@ -54,7 +54,7 @@ under the License.
<link rel="index" title="Index" href="../genindex.html" />
<link rel="search" title="Search" href="../search.html" />
<link rel="next" title="Supported Spark Data Types" href="datatypes.html"
/>
- <link rel="prev" title="Installing DataFusion Comet"
href="installation.html" />
+ <link rel="prev" title="Comet Kubernetes Support" href="kubernetes.html" />
<meta name="viewport" content="width=device-width, initial-scale=1" />
<meta name="docsearch:language" content="en">
@@ -118,6 +118,16 @@ under the License.
Installing Comet
</a>
</li>
+ <li class="toctree-l1">
+ <a class="reference internal" href="source.html">
+ Building From Source
+ </a>
+ </li>
+ <li class="toctree-l1">
+ <a class="reference internal" href="kubernetes.html">
+ Kubernetes Guide
+ </a>
+ </li>
<li class="toctree-l1 current active">
<a class="current reference internal" href="#">
Supported Data Sources
@@ -359,11 +369,11 @@ converted into Arrow format, allowing native execution to
happen after that.</p>
<!-- Previous / next buttons -->
<div class='prev-next-area'>
- <a class='left-prev' id="prev-link" href="installation.html"
title="previous page">
+ <a class='left-prev' id="prev-link" href="kubernetes.html" title="previous
page">
<i class="fas fa-angle-left"></i>
<div class="prev-next-info">
<p class="prev-next-subtitle">previous</p>
- <p class="prev-next-title">Installing DataFusion Comet</p>
+ <p class="prev-next-title">Comet Kubernetes Support</p>
</div>
</a>
<a class='right-next' id="next-link" href="datatypes.html" title="next
page">
diff --git a/user-guide/datatypes.html b/user-guide/datatypes.html
index a9f60c80..72bc2d5b 100644
--- a/user-guide/datatypes.html
+++ b/user-guide/datatypes.html
@@ -118,6 +118,16 @@ under the License.
Installing Comet
</a>
</li>
+ <li class="toctree-l1">
+ <a class="reference internal" href="source.html">
+ Building From Source
+ </a>
+ </li>
+ <li class="toctree-l1">
+ <a class="reference internal" href="kubernetes.html">
+ Kubernetes Guide
+ </a>
+ </li>
<li class="toctree-l1">
<a class="reference internal" href="datasources.html">
Supported Data Sources
diff --git a/user-guide/expressions.html b/user-guide/expressions.html
index f4a94ba3..3b3a90db 100644
--- a/user-guide/expressions.html
+++ b/user-guide/expressions.html
@@ -118,6 +118,16 @@ under the License.
Installing Comet
</a>
</li>
+ <li class="toctree-l1">
+ <a class="reference internal" href="source.html">
+ Building From Source
+ </a>
+ </li>
+ <li class="toctree-l1">
+ <a class="reference internal" href="kubernetes.html">
+ Kubernetes Guide
+ </a>
+ </li>
<li class="toctree-l1">
<a class="reference internal" href="datasources.html">
Supported Data Sources
diff --git a/user-guide/installation.html b/user-guide/installation.html
index 8dacc617..113a1173 100644
--- a/user-guide/installation.html
+++ b/user-guide/installation.html
@@ -53,7 +53,7 @@ under the License.
<script async="true" defer="true"
src="https://buttons.github.io/buttons.js"></script>
<link rel="index" title="Index" href="../genindex.html" />
<link rel="search" title="Search" href="../search.html" />
- <link rel="next" title="Supported Spark Data Sources"
href="datasources.html" />
+ <link rel="next" title="Building Comet From Source" href="source.html" />
<link rel="prev" title="Comet Overview" href="overview.html" />
<meta name="viewport" content="width=device-width, initial-scale=1" />
<meta name="docsearch:language" content="en">
@@ -118,6 +118,16 @@ under the License.
Installing Comet
</a>
</li>
+ <li class="toctree-l1">
+ <a class="reference internal" href="source.html">
+ Building From Source
+ </a>
+ </li>
+ <li class="toctree-l1">
+ <a class="reference internal" href="kubernetes.html">
+ Kubernetes Guide
+ </a>
+ </li>
<li class="toctree-l1">
<a class="reference internal" href="datasources.html">
Supported Data Sources
@@ -272,19 +282,21 @@ under the License.
<nav id="bd-toc-nav">
<ul class="visible nav section-nav flex-column">
<li class="toc-h2 nav-item toc-entry">
- <a class="reference internal nav-link" href="#supported-platforms">
- Supported Platforms
- </a>
- </li>
- <li class="toc-h2 nav-item toc-entry">
- <a class="reference internal nav-link" href="#requirements">
- Requirements
- </a>
- </li>
- <li class="toc-h2 nav-item toc-entry">
- <a class="reference internal nav-link" href="#deploying-to-kubernetes">
- Deploying to Kubernetes
+ <a class="reference internal nav-link" href="#prerequisites">
+ Prerequisites
</a>
+ <ul class="nav section-nav flex-column">
+ <li class="toc-h3 nav-item toc-entry">
+ <a class="reference internal nav-link" href="#supported-operating-systems">
+ Supported Operating Systems
+ </a>
+ </li>
+ <li class="toc-h3 nav-item toc-entry">
+ <a class="reference internal nav-link" href="#supported-spark-versions">
+ Supported Spark Versions
+ </a>
+ </li>
+ </ul>
</li>
<li class="toc-h2 nav-item toc-entry">
<a class="reference internal nav-link" href="#using-a-published-jar-file">
@@ -292,13 +304,13 @@ under the License.
</a>
</li>
<li class="toc-h2 nav-item toc-entry">
- <a class="reference internal nav-link"
href="#using-a-published-source-release">
- Using a Published Source Release
+ <a class="reference internal nav-link" href="#building-from-source">
+ Building from source
</a>
</li>
<li class="toc-h2 nav-item toc-entry">
- <a class="reference internal nav-link"
href="#building-from-the-github-repository">
- Building from the GitHub repository
+ <a class="reference internal nav-link" href="#deploying-to-kubernetes">
+ Deploying to Kubernetes
</a>
</li>
<li class="toc-h2 nav-item toc-entry">
@@ -311,11 +323,13 @@ under the License.
Verify Comet enabled for Spark SQL query
</a>
</li>
- <li class="toc-h3 nav-item toc-entry">
- <a class="reference internal nav-link" href="#enable-comet-shuffle">
- Enable Comet shuffle
- </a>
- </li>
+ </ul>
+ </li>
+ <li class="toc-h2 nav-item toc-entry">
+ <a class="reference internal nav-link" href="#additional-configuration">
+ Additional Configuration
+ </a>
+ <ul class="nav section-nav flex-column">
<li class="toc-h3 nav-item toc-entry">
<a class="reference internal nav-link" href="#memory-tuning">
Memory tuning
@@ -371,66 +385,54 @@ under the License.
-->
<section id="installing-datafusion-comet">
<h1>Installing DataFusion Comet<a class="headerlink"
href="#installing-datafusion-comet" title="Link to this heading">¶</a></h1>
+<section id="prerequisites">
+<h2>Prerequisites<a class="headerlink" href="#prerequisites" title="Link to
this heading">¶</a></h2>
<p>Make sure the following requirements are met and software installed on your
machine.</p>
-<section id="supported-platforms">
-<h2>Supported Platforms<a class="headerlink" href="#supported-platforms"
title="Link to this heading">¶</a></h2>
+<section id="supported-operating-systems">
+<h3>Supported Operating Systems<a class="headerlink"
href="#supported-operating-systems" title="Link to this heading">¶</a></h3>
<ul class="simple">
<li><p>Linux</p></li>
<li><p>Apple OSX (Intel and Apple Silicon)</p></li>
</ul>
</section>
-<section id="requirements">
-<h2>Requirements<a class="headerlink" href="#requirements" title="Link to this
heading">¶</a></h2>
+<section id="supported-spark-versions">
+<h3>Supported Spark Versions<a class="headerlink"
href="#supported-spark-versions" title="Link to this heading">¶</a></h3>
+<p>Comet currently supports the following versions of Apache Spark:</p>
+<ul class="simple">
+<li><p>3.3.x (Java 8/11/17, Scala 2.12/2.13)</p></li>
+<li><p>3.4.x (Java 8/11/17, Scala 2.12/2.13)</p></li>
+<li><p>3.5.x (Java 8/11/17, Scala 2.12/2.13)</p></li>
+</ul>
+<p>Experimental support is provided for the following versions of Apache Spark
and is intended for development/testing
+use only and should not be used in production yet.</p>
<ul class="simple">
-<li><p><a class="reference internal"
href="overview.html#supported-apache-spark-versions"><span class="std
std-ref">Apache Spark supported by Comet</span></a></p></li>
-<li><p>JDK 8 and up</p></li>
-<li><p>GLIBC 2.17 (Centos 7) and up</p></li>
+<li><p>4.0.0-preview1 (Java 17/21, Scala 2.13)</p></li>
</ul>
+<p>Note that Comet may not fully work with proprietary forks of Apache Spark
such as the Spark versions offered by
+Cloud Service Providers.</p>
</section>
-<section id="deploying-to-kubernetes">
-<h2>Deploying to Kubernetes<a class="headerlink"
href="#deploying-to-kubernetes" title="Link to this heading">¶</a></h2>
-<p>See the <a class="reference internal" href="kubernetes.html"><span
class="std std-doc">Comet Kubernetes Guide</span></a> guide.</p>
</section>
<section id="using-a-published-jar-file">
<h2>Using a Published JAR File<a class="headerlink"
href="#using-a-published-jar-file" title="Link to this heading">¶</a></h2>
-<p>Pre-built jar files are available in Maven central at
https://central.sonatype.com/namespace/org.apache.datafusion</p>
+<p>Comet jar files are available in <a class="reference external"
href="https://central.sonatype.com/namespace/org.apache.datafusion">Maven
Central</a>.</p>
+<p>Here are the direct links for downloading the Comet jar file.</p>
+<ul class="simple">
+<li><p><a class="reference external"
href="https://repo1.maven.org/maven2/org/apache/datafusion/comet-spark-spark3.3_2.12/0.3.0/comet-spark-spark3.3_2.12-0.3.0.jar">Comet
plugin for Spark 3.3 / Scala 2.12</a></p></li>
+<li><p><a class="reference external"
href="https://repo1.maven.org/maven2/org/apache/datafusion/comet-spark-spark3.3_2.13/0.3.0/comet-spark-spark3.3_2.13-0.3.0.jar">Comet
plugin for Spark 3.3 / Scala 2.13</a></p></li>
+<li><p><a class="reference external"
href="https://repo1.maven.org/maven2/org/apache/datafusion/comet-spark-spark3.4_2.12/0.3.0/comet-spark-spark3.4_2.12-0.3.0.jar">Comet
plugin for Spark 3.4 / Scala 2.12</a></p></li>
+<li><p><a class="reference external"
href="https://repo1.maven.org/maven2/org/apache/datafusion/comet-spark-spark3.4_2.13/0.3.0/comet-spark-spark3.4_2.13-0.3.0.jar">Comet
plugin for Spark 3.4 / Scala 2.13</a></p></li>
+<li><p><a class="reference external"
href="https://repo1.maven.org/maven2/org/apache/datafusion/comet-spark-spark3.5_2.12/0.3.0/comet-spark-spark3.5_2.12-0.3.0.jar">Comet
plugin for Spark 3.5 / Scala 2.12</a></p></li>
+<li><p><a class="reference external"
href="https://repo1.maven.org/maven2/org/apache/datafusion/comet-spark-spark3.5_2.13/0.3.0/comet-spark-spark3.5_2.13-0.3.0.jar">Comet
plugin for Spark 3.5 / Scala 2.13</a></p></li>
+</ul>
</section>
-<section id="using-a-published-source-release">
-<h2>Using a Published Source Release<a class="headerlink"
href="#using-a-published-source-release" title="Link to this heading">¶</a></h2>
-<p>Official source releases can be downloaded from
https://dist.apache.org/repos/dist/release/datafusion/</p>
-<div class="highlight-console notranslate"><div
class="highlight"><pre><span></span><span class="gp"># </span>Pick<span
class="w"> </span>the<span class="w"> </span>latest<span class="w">
</span>version
-<span class="go">export COMET_VERSION=0.3.0</span>
-<span class="gp"># </span>Download<span class="w"> </span>the<span class="w">
</span>tarball
-<span class="go">curl -O
"https://dist.apache.org/repos/dist/release/datafusion/datafusion-comet-$COMET_VERSION/apache-datafusion-comet-$COMET_VERSION.tar.gz"</span>
-<span class="gp"># </span>Unpack
-<span class="go">tar -xzf apache-datafusion-comet-$COMET_VERSION.tar.gz</span>
-<span class="go">cd apache-datafusion-comet-$COMET_VERSION</span>
-</pre></div>
-</div>
-<p>Build</p>
-<div class="highlight-console notranslate"><div
class="highlight"><pre><span></span><span class="go">make release-nogit
PROFILES="-Pspark-3.4"</span>
-</pre></div>
-</div>
+<section id="building-from-source">
+<h2>Building from source<a class="headerlink" href="#building-from-source"
title="Link to this heading">¶</a></h2>
+<p>Refer to the <a class="reference internal" href="source.html"><span
class="std std-doc">Building from Source</span></a> guide for instructions from
building Comet from source, either from official
+source releases, or from the latest code in the GitHub repository.</p>
</section>
-<section id="building-from-the-github-repository">
-<h2>Building from the GitHub repository<a class="headerlink"
href="#building-from-the-github-repository" title="Link to this
heading">¶</a></h2>
-<p>Clone the repository:</p>
-<div class="highlight-console notranslate"><div
class="highlight"><pre><span></span><span class="go">git clone
https://github.com/apache/datafusion-comet.git</span>
-</pre></div>
-</div>
-<p>Build Comet for a specific Spark version:</p>
-<div class="highlight-console notranslate"><div
class="highlight"><pre><span></span><span class="go">cd datafusion-comet</span>
-<span class="go">make release PROFILES="-Pspark-3.4"</span>
-</pre></div>
-</div>
-<p>Note that the project builds for Scala 2.12 by default but can be built for
Scala 2.13 using an additional profile:</p>
-<div class="highlight-console notranslate"><div
class="highlight"><pre><span></span><span class="go">make release
PROFILES="-Pspark-3.4 -Pscala-2.13"</span>
-</pre></div>
-</div>
-<p>To build Comet from the source distribution on an isolated environment
without an access to <code class="docutils literal notranslate"><span
class="pre">github.com</span></code> it is necessary to disable <code
class="docutils literal notranslate"><span
class="pre">git-commit-id-maven-plugin</span></code>, otherwise you will face
errors that there is no access to the git during the build process. In that
case you may use:</p>
-<div class="highlight-console notranslate"><div
class="highlight"><pre><span></span><span class="go">make release-nogit
PROFILES="-Pspark-3.4"</span>
-</pre></div>
-</div>
+<section id="deploying-to-kubernetes">
+<h2>Deploying to Kubernetes<a class="headerlink"
href="#deploying-to-kubernetes" title="Link to this heading">¶</a></h2>
+<p>See the <a class="reference internal" href="kubernetes.html"><span
class="std std-doc">Comet Kubernetes Guide</span></a> guide.</p>
</section>
<section id="run-spark-shell-with-comet-enabled">
<h2>Run Spark Shell with Comet enabled<a class="headerlink"
href="#run-spark-shell-with-comet-enabled" title="Link to this
heading">¶</a></h2>
@@ -442,11 +444,10 @@ under the License.
<span class="w"> </span>--conf<span class="w">
</span>spark.driver.extraClassPath<span class="o">=</span><span
class="nv">$COMET_JAR</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--conf<span class="w">
</span>spark.executor.extraClassPath<span class="o">=</span><span
class="nv">$COMET_JAR</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--conf<span class="w"> </span>spark.plugins<span
class="o">=</span>org.apache.spark.CometPlugin<span class="w"> </span><span
class="se">\</span>
-<span class="w"> </span>--conf<span class="w">
</span>spark.comet.enabled<span class="o">=</span><span
class="nb">true</span><span class="w"> </span><span class="se">\</span>
-<span class="w"> </span>--conf<span class="w">
</span>spark.comet.exec.enabled<span class="o">=</span><span
class="nb">true</span><span class="w"> </span><span class="se">\</span>
-<span class="w"> </span>--conf<span class="w">
</span>spark.comet.explainFallback.enabled<span class="o">=</span><span
class="nb">true</span><span class="w"> </span><span class="se">\</span>
-<span class="w"> </span>--conf<span class="w">
</span>spark.driver.memory<span class="o">=</span>1g<span class="w">
</span><span class="se">\</span>
-<span class="w"> </span>--conf<span class="w">
</span>spark.executor.memory<span class="o">=</span>1g
+<span class="w"> </span>--conf<span class="w">
</span>spark.shuffle.manager<span
class="o">=</span>org.apache.spark.sql.comet.execution.shuffle.CometShuffleManager
+<span class="go"> --conf spark.comet.explainFallback.enabled=true \</span>
+<span class="go"> --conf spark.memory.offHeap.enabled=true \</span>
+<span class="go"> --conf spark.memory.offHeap.size=16g \</span>
</pre></div>
</div>
<section id="verify-comet-enabled-for-spark-sql-query">
@@ -481,31 +482,20 @@ being executed natively.</p>
</pre></div>
</div>
</section>
-<section id="enable-comet-shuffle">
-<h3>Enable Comet shuffle<a class="headerlink" href="#enable-comet-shuffle"
title="Link to this heading">¶</a></h3>
-<p>Comet shuffle feature is disabled by default. To enable it, please add
related configs:</p>
-<div class="highlight-default notranslate"><div
class="highlight"><pre><span></span><span class="o">--</span><span
class="n">conf</span> <span class="n">spark</span><span class="o">.</span><span
class="n">shuffle</span><span class="o">.</span><span
class="n">manager</span><span class="o">=</span><span class="n">org</span><span
class="o">.</span><span class="n">apache</span><span class="o">.</span><span
class="n">spark</span><span class="o">.</span><span class="n">sql</span><span
class="o [...]
-<span class="o">--</span><span class="n">conf</span> <span
class="n">spark</span><span class="o">.</span><span class="n">comet</span><span
class="o">.</span><span class="n">exec</span><span class="o">.</span><span
class="n">shuffle</span><span class="o">.</span><span
class="n">enabled</span><span class="o">=</span><span class="n">true</span>
-</pre></div>
-</div>
-<p>Above configs enable Comet native shuffle which only supports hash
partition and single partition.
-Comet native shuffle doesn’t support complex types yet.</p>
-<p>Comet doesn’t have official release yet so currently the only way to test
it is to build jar and include it in your
-Spark application. Depending on your deployment mode you may also need to set
the driver & executor class path(s) to
+</section>
+<section id="additional-configuration">
+<h2>Additional Configuration<a class="headerlink"
href="#additional-configuration" title="Link to this heading">¶</a></h2>
+<p>Depending on your deployment mode you may also need to set the driver &
executor class path(s) to
explicitly contain Comet otherwise Spark may use a different class-loader for
the Comet components than its internal
components which will then fail at runtime. For example:</p>
<div class="highlight-default notranslate"><div
class="highlight"><pre><span></span><span class="o">--</span><span
class="n">driver</span><span class="o">-</span><span
class="n">class</span><span class="o">-</span><span class="n">path</span> <span
class="n">spark</span><span class="o">/</span><span
class="n">target</span><span class="o">/</span><span
class="n">comet</span><span class="o">-</span><span class="n">spark</span><span
class="o">-</span><span class="n">spark3</span><span class= [...]
</pre></div>
</div>
<p>Some cluster managers may require additional configuration, see <a
class="reference external"
href="https://spark.apache.org/docs/latest/cluster-overview.html">https://spark.apache.org/docs/latest/cluster-overview.html</a></p>
-<p>To enable columnar shuffle which supports all partitioning and basic
complex types, one more config is required:</p>
-<div class="highlight-default notranslate"><div
class="highlight"><pre><span></span><span class="o">--</span><span
class="n">conf</span> <span class="n">spark</span><span class="o">.</span><span
class="n">comet</span><span class="o">.</span><span class="n">exec</span><span
class="o">.</span><span class="n">shuffle</span><span class="o">.</span><span
class="n">mode</span><span class="o">=</span><span class="n">jvm</span>
-</pre></div>
-</div>
-</section>
<section id="memory-tuning">
<h3>Memory tuning<a class="headerlink" href="#memory-tuning" title="Link to
this heading">¶</a></h3>
-<p>In addition to Apache Spark memory configuration parameters the Comet
introduces own parameters to configure memory allocation for native execution.
More <a class="reference internal" href="tuning.html"><span class="std
std-doc">Comet Memory Tuning</span></a></p>
+<p>In addition to Apache Spark memory configuration parameters, Comet
introduces additional parameters to configure memory
+allocation for native execution. See <a class="reference internal"
href="tuning.html"><span class="std std-doc">Comet Memory Tuning</span></a> for
details.</p>
</section>
</section>
</section>
@@ -523,10 +513,10 @@ components which will then fail at runtime. For
example:</p>
<p class="prev-next-title">Comet Overview</p>
</div>
</a>
- <a class='right-next' id="next-link" href="datasources.html" title="next
page">
+ <a class='right-next' id="next-link" href="source.html" title="next page">
<div class="prev-next-info">
<p class="prev-next-subtitle">next</p>
- <p class="prev-next-title">Supported Spark Data Sources</p>
+ <p class="prev-next-title">Building Comet From Source</p>
</div>
<i class="fas fa-angle-right"></i>
</a>
diff --git a/user-guide/kubernetes.html b/user-guide/kubernetes.html
index b1240e1a..4b520161 100644
--- a/user-guide/kubernetes.html
+++ b/user-guide/kubernetes.html
@@ -53,6 +53,8 @@ under the License.
<script async="true" defer="true"
src="https://buttons.github.io/buttons.js"></script>
<link rel="index" title="Index" href="../genindex.html" />
<link rel="search" title="Search" href="../search.html" />
+ <link rel="next" title="Supported Spark Data Sources"
href="datasources.html" />
+ <link rel="prev" title="Building Comet From Source" href="source.html" />
<meta name="viewport" content="width=device-width, initial-scale=1" />
<meta name="docsearch:language" content="en">
@@ -105,7 +107,7 @@ under the License.
User Guide
</span>
</p>
-<ul class="nav bd-sidenav">
+<ul class="current nav bd-sidenav">
<li class="toctree-l1">
<a class="reference internal" href="overview.html">
Comet Overview
@@ -116,6 +118,16 @@ under the License.
Installing Comet
</a>
</li>
+ <li class="toctree-l1">
+ <a class="reference internal" href="source.html">
+ Building From Source
+ </a>
+ </li>
+ <li class="toctree-l1 current active">
+ <a class="current reference internal" href="#">
+ Kubernetes Guide
+ </a>
+ </li>
<li class="toctree-l1">
<a class="reference internal" href="datasources.html">
Supported Data Sources
@@ -424,6 +436,20 @@ spec:
<!-- Previous / next buttons -->
<div class='prev-next-area'>
+ <a class='left-prev' id="prev-link" href="source.html" title="previous
page">
+ <i class="fas fa-angle-left"></i>
+ <div class="prev-next-info">
+ <p class="prev-next-subtitle">previous</p>
+ <p class="prev-next-title">Building Comet From Source</p>
+ </div>
+ </a>
+ <a class='right-next' id="next-link" href="datasources.html" title="next
page">
+ <div class="prev-next-info">
+ <p class="prev-next-subtitle">next</p>
+ <p class="prev-next-title">Supported Spark Data Sources</p>
+ </div>
+ <i class="fas fa-angle-right"></i>
+ </a>
</div>
</main>
diff --git a/user-guide/operators.html b/user-guide/operators.html
index 5af3326f..1e796b56 100644
--- a/user-guide/operators.html
+++ b/user-guide/operators.html
@@ -118,6 +118,16 @@ under the License.
Installing Comet
</a>
</li>
+ <li class="toctree-l1">
+ <a class="reference internal" href="source.html">
+ Building From Source
+ </a>
+ </li>
+ <li class="toctree-l1">
+ <a class="reference internal" href="kubernetes.html">
+ Kubernetes Guide
+ </a>
+ </li>
<li class="toctree-l1">
<a class="reference internal" href="datasources.html">
Supported Data Sources
diff --git a/user-guide/overview.html b/user-guide/overview.html
index 2b33409a..27bd4f55 100644
--- a/user-guide/overview.html
+++ b/user-guide/overview.html
@@ -118,6 +118,16 @@ under the License.
Installing Comet
</a>
</li>
+ <li class="toctree-l1">
+ <a class="reference internal" href="source.html">
+ Building From Source
+ </a>
+ </li>
+ <li class="toctree-l1">
+ <a class="reference internal" href="kubernetes.html">
+ Kubernetes Guide
+ </a>
+ </li>
<li class="toctree-l1">
<a class="reference internal" href="datasources.html">
Supported Data Sources
@@ -277,13 +287,13 @@ under the License.
</a>
</li>
<li class="toc-h2 nav-item toc-entry">
- <a class="reference internal nav-link"
href="#supported-apache-spark-versions">
- Supported Apache Spark versions
+ <a class="reference internal nav-link"
href="#feature-parity-with-apache-spark">
+ Feature Parity with Apache Spark
</a>
</li>
<li class="toc-h2 nav-item toc-entry">
- <a class="reference internal nav-link"
href="#feature-parity-with-apache-spark">
- Feature Parity with Apache Spark
+ <a class="reference internal nav-link" href="#getting-started">
+ Getting Started
</a>
</li>
</ul>
@@ -334,8 +344,11 @@ under the License.
-->
<section id="comet-overview">
<h1>Comet Overview<a class="headerlink" href="#comet-overview" title="Link to
this heading">¶</a></h1>
-<p>Comet runs Spark SQL queries using the native Apache DataFusion runtime,
which is
-typically faster and more resource efficient than JVM based runtimes.</p>
+<p>Apache DataFusion Comet is a high-performance accelerator for Apache Spark,
built on top of the powerful
+<a class="reference external" href="https://datafusion.apache.org">Apache
DataFusion</a> query engine. Comet is designed to significantly enhance the
+performance of Apache Spark workloads while leveraging commodity hardware and
seamlessly integrating with the
+Spark ecosystem without requiring any code changes.</p>
+<p>The following diagram provides an overview of Comet’s architecture.</p>
<p><img alt="Comet Overview" src="../_images/comet-overview.png" /></p>
<p>Comet aims to support:</p>
<ul class="simple">
@@ -347,25 +360,9 @@ Filter/Project/Aggregation/Join/Exchange etc.</p></li>
</ul>
<section id="architecture">
<h2>Architecture<a class="headerlink" href="#architecture" title="Link to this
heading">¶</a></h2>
-<p>The following diagram illustrates the architecture of Comet:</p>
+<p>The following diagram shows how Comet integrates with Apache Spark.</p>
<p><img alt="Comet System Diagram" src="../_images/comet-system-diagram.png"
/></p>
</section>
-<section id="supported-apache-spark-versions">
-<h2>Supported Apache Spark versions<a class="headerlink"
href="#supported-apache-spark-versions" title="Link to this heading">¶</a></h2>
-<p>Comet currently supports the following versions of Apache Spark:</p>
-<ul class="simple">
-<li><p>3.3.x</p></li>
-<li><p>3.4.x</p></li>
-<li><p>3.5.x</p></li>
-</ul>
-<p>Experimental support is provided for the following versions of Apache Spark
and is intended for development/testing
-use only and should not be used in production yet.</p>
-<ul class="simple">
-<li><p>4.0.0-preview1</p></li>
-</ul>
-<p>Note that Comet may not fully work with proprietary forks of Apache Spark
such as the Spark versions offered by
-Cloud Service Providers.</p>
-</section>
<section id="feature-parity-with-apache-spark">
<h2>Feature Parity with Apache Spark<a class="headerlink"
href="#feature-parity-with-apache-spark" title="Link to this heading">¶</a></h2>
<p>The project strives to keep feature parity with Apache Spark, that is,
@@ -377,6 +374,10 @@ features and fallback to Spark engine.</p>
Spark SQL tests and make sure they all pass with Comet extension
enabled.</p>
</section>
+<section id="getting-started">
+<h2>Getting Started<a class="headerlink" href="#getting-started" title="Link
to this heading">¶</a></h2>
+<p>Refer to the <a class="reference internal" href="installation.html"><span
class="std std-doc">Comet Installation Guide</span></a> to get started.</p>
+</section>
</section>
diff --git a/user-guide/datasources.html b/user-guide/source.html
similarity index 74%
copy from user-guide/datasources.html
copy to user-guide/source.html
index 2d459507..c610ba61 100644
--- a/user-guide/datasources.html
+++ b/user-guide/source.html
@@ -24,7 +24,7 @@ under the License.
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0"
/><meta name="viewport" content="width=device-width, initial-scale=1" />
- <title>Supported Spark Data Sources — Apache DataFusion Comet
documentation</title>
+ <title>Building Comet From Source — Apache DataFusion Comet
documentation</title>
<link href="../_static/styles/theme.css?digest=1999514e3f237ded88cf"
rel="stylesheet">
<link
href="../_static/styles/pydata-sphinx-theme.css?digest=1999514e3f237ded88cf"
rel="stylesheet">
@@ -53,7 +53,7 @@ under the License.
<script async="true" defer="true"
src="https://buttons.github.io/buttons.js"></script>
<link rel="index" title="Index" href="../genindex.html" />
<link rel="search" title="Search" href="../search.html" />
- <link rel="next" title="Supported Spark Data Types" href="datatypes.html"
/>
+ <link rel="next" title="Comet Kubernetes Support" href="kubernetes.html" />
<link rel="prev" title="Installing DataFusion Comet"
href="installation.html" />
<meta name="viewport" content="width=device-width, initial-scale=1" />
<meta name="docsearch:language" content="en">
@@ -120,6 +120,16 @@ under the License.
</li>
<li class="toctree-l1 current active">
<a class="current reference internal" href="#">
+ Building From Source
+ </a>
+ </li>
+ <li class="toctree-l1">
+ <a class="reference internal" href="kubernetes.html">
+ Kubernetes Guide
+ </a>
+ </li>
+ <li class="toctree-l1">
+ <a class="reference internal" href="datasources.html">
Supported Data Sources
</a>
</li>
@@ -272,18 +282,13 @@ under the License.
<nav id="bd-toc-nav">
<ul class="visible nav section-nav flex-column">
<li class="toc-h2 nav-item toc-entry">
- <a class="reference internal nav-link" href="#parquet">
- Parquet
- </a>
- </li>
- <li class="toc-h2 nav-item toc-entry">
- <a class="reference internal nav-link" href="#csv">
- CSV
+ <a class="reference internal nav-link"
href="#using-a-published-source-release">
+ Using a Published Source Release
</a>
</li>
<li class="toc-h2 nav-item toc-entry">
- <a class="reference internal nav-link" href="#json">
- JSON
+ <a class="reference internal nav-link"
href="#building-from-the-github-repository">
+ Building from the GitHub repository
</a>
</li>
</ul>
@@ -295,7 +300,7 @@ under the License.
<div class="tocsection editthispage">
- <a
href="https://github.com/apache/datafusion-comet/edit/main/docs/source/user-guide/datasources.md">
+ <a
href="https://github.com/apache/datafusion-comet/edit/main/docs/source/user-guide/source.md">
<i class="fas fa-pencil-alt"></i> Edit this page
</a>
</div>
@@ -332,24 +337,45 @@ under the License.
specific language governing permissions and limitations
under the License.
-->
-<section id="supported-spark-data-sources">
-<h1>Supported Spark Data Sources<a class="headerlink"
href="#supported-spark-data-sources" title="Link to this heading">¶</a></h1>
-<section id="parquet">
-<h2>Parquet<a class="headerlink" href="#parquet" title="Link to this
heading">¶</a></h2>
-<p>When <code class="docutils literal notranslate"><span
class="pre">spark.comet.scan.enabled</span></code> is enabled, Parquet scans
will be performed natively by Comet if all data types
-in the schema are supported. When this option is not enabled, the scan will
fall back to Spark. In this case,
-enabling <code class="docutils literal notranslate"><span
class="pre">spark.comet.convert.parquet.enabled</span></code> will immediately
convert the data into Arrow format, allowing native
-execution to happen after that, but the process may not be efficient.</p>
-</section>
-<section id="csv">
-<h2>CSV<a class="headerlink" href="#csv" title="Link to this
heading">¶</a></h2>
-<p>Comet does not provide native CSV scan, but when <code class="docutils
literal notranslate"><span
class="pre">spark.comet.convert.csv.enabled</span></code> is enabled, data is
immediately
-converted into Arrow format, allowing native execution to happen after
that.</p>
+<section id="building-comet-from-source">
+<h1>Building Comet From Source<a class="headerlink"
href="#building-comet-from-source" title="Link to this heading">¶</a></h1>
+<p>It is sometimes preferable to build from source for a specific platform.</p>
+<section id="using-a-published-source-release">
+<h2>Using a Published Source Release<a class="headerlink"
href="#using-a-published-source-release" title="Link to this heading">¶</a></h2>
+<p>Official source releases can be downloaded from
https://dist.apache.org/repos/dist/release/datafusion/</p>
+<div class="highlight-console notranslate"><div
class="highlight"><pre><span></span><span class="gp"># </span>Pick<span
class="w"> </span>the<span class="w"> </span>latest<span class="w">
</span>version
+<span class="go">export COMET_VERSION=0.3.0</span>
+<span class="gp"># </span>Download<span class="w"> </span>the<span class="w">
</span>tarball
+<span class="go">curl -O
"https://dist.apache.org/repos/dist/release/datafusion/datafusion-comet-$COMET_VERSION/apache-datafusion-comet-$COMET_VERSION.tar.gz"</span>
+<span class="gp"># </span>Unpack
+<span class="go">tar -xzf apache-datafusion-comet-$COMET_VERSION.tar.gz</span>
+<span class="go">cd apache-datafusion-comet-$COMET_VERSION</span>
+</pre></div>
+</div>
+<p>Build</p>
+<div class="highlight-console notranslate"><div
class="highlight"><pre><span></span><span class="go">make release-nogit
PROFILES="-Pspark-3.4"</span>
+</pre></div>
+</div>
</section>
-<section id="json">
-<h2>JSON<a class="headerlink" href="#json" title="Link to this
heading">¶</a></h2>
-<p>Comet does not provide native JSON scan, but when <code class="docutils
literal notranslate"><span
class="pre">spark.comet.convert.json.enabled</span></code> is enabled, data is
immediately
-converted into Arrow format, allowing native execution to happen after
that.</p>
+<section id="building-from-the-github-repository">
+<h2>Building from the GitHub repository<a class="headerlink"
href="#building-from-the-github-repository" title="Link to this
heading">¶</a></h2>
+<p>Clone the repository:</p>
+<div class="highlight-console notranslate"><div
class="highlight"><pre><span></span><span class="go">git clone
https://github.com/apache/datafusion-comet.git</span>
+</pre></div>
+</div>
+<p>Build Comet for a specific Spark version:</p>
+<div class="highlight-console notranslate"><div
class="highlight"><pre><span></span><span class="go">cd datafusion-comet</span>
+<span class="go">make release PROFILES="-Pspark-3.4"</span>
+</pre></div>
+</div>
+<p>Note that the project builds for Scala 2.12 by default but can be built for
Scala 2.13 using an additional profile:</p>
+<div class="highlight-console notranslate"><div
class="highlight"><pre><span></span><span class="go">make release
PROFILES="-Pspark-3.4 -Pscala-2.13"</span>
+</pre></div>
+</div>
+<p>To build Comet from the source distribution on an isolated environment
without an access to <code class="docutils literal notranslate"><span
class="pre">github.com</span></code> it is necessary to disable <code
class="docutils literal notranslate"><span
class="pre">git-commit-id-maven-plugin</span></code>, otherwise you will face
errors that there is no access to the git during the build process. In that
case you may use:</p>
+<div class="highlight-console notranslate"><div
class="highlight"><pre><span></span><span class="go">make release-nogit
PROFILES="-Pspark-3.4"</span>
+</pre></div>
+</div>
</section>
</section>
@@ -366,10 +392,10 @@ converted into Arrow format, allowing native execution to
happen after that.</p>
<p class="prev-next-title">Installing DataFusion Comet</p>
</div>
</a>
- <a class='right-next' id="next-link" href="datatypes.html" title="next
page">
+ <a class='right-next' id="next-link" href="kubernetes.html" title="next
page">
<div class="prev-next-info">
<p class="prev-next-subtitle">next</p>
- <p class="prev-next-title">Supported Spark Data Types</p>
+ <p class="prev-next-title">Comet Kubernetes Support</p>
</div>
<i class="fas fa-angle-right"></i>
</a>
diff --git a/user-guide/tuning.html b/user-guide/tuning.html
index 81da3f96..0edd1c96 100644
--- a/user-guide/tuning.html
+++ b/user-guide/tuning.html
@@ -118,6 +118,16 @@ under the License.
Installing Comet
</a>
</li>
+ <li class="toctree-l1">
+ <a class="reference internal" href="source.html">
+ Building From Source
+ </a>
+ </li>
+ <li class="toctree-l1">
+ <a class="reference internal" href="kubernetes.html">
+ Kubernetes Guide
+ </a>
+ </li>
<li class="toctree-l1">
<a class="reference internal" href="datasources.html">
Supported Data Sources
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]