This is an automated email from the ASF dual-hosted git repository.
agrove pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/datafusion-comet.git
The following commit(s) were added to refs/heads/main by this push:
new 837c256f docs: Various documentation improvements (#1005)
837c256f is described below
commit 837c256f0de16ea06b04bdc84503367b8a87be03
Author: Andy Grove <[email protected]>
AuthorDate: Tue Oct 8 15:16:12 2024 -0600
docs: Various documentation improvements (#1005)
* various documentation improvements
* add direct download urls
---
README.md | 4 +-
.../_static/images/CometNativeExecution.drawio.png | Bin 61017 -> 0 bytes
.../_static/images/CometNativeParquetReader.drawio | 100 +++++++++++++++++++
.../images/CometNativeParquetReader.drawio.svg | 4 +
.../images/CometNativeParquetScan.drawio.png | Bin 75703 -> 0 bytes
.../_static/images/CometOverviewDetailed.drawio | 94 ++++++++++++++++++
.../images/CometOverviewDetailed.drawio.svg | 4 +
docs/source/contributor-guide/plugin_overview.md | 4 +-
docs/source/index.rst | 2 +
docs/source/user-guide/installation.md | 107 +++++++--------------
docs/source/user-guide/overview.md | 34 +++----
docs/source/user-guide/source.md | 69 +++++++++++++
12 files changed, 329 insertions(+), 93 deletions(-)
diff --git a/README.md b/README.md
index c318b053..1a6281a9 100644
--- a/README.md
+++ b/README.md
@@ -30,10 +30,12 @@ under the License.
<img src="docs/source/_static/images/DataFusionComet-Logo-Light.png"
width="512" alt="logo"/>
Apache DataFusion Comet is a high-performance accelerator for Apache Spark,
built on top of the powerful
-[Apache DataFusion](https://datafusion.apache.org) query engine. Comet is
designed to significantly enhance the
+[Apache DataFusion] query engine. Comet is designed to significantly enhance
the
performance of Apache Spark workloads while leveraging commodity hardware and
seamlessly integrating with the
Spark ecosystem without requiring any code changes.
+[Apache DataFusion]: https://datafusion.apache.org
+
# Benefits of Using Comet
## Run Spark Queries at DataFusion Speeds
diff --git a/docs/source/_static/images/CometNativeExecution.drawio.png
b/docs/source/_static/images/CometNativeExecution.drawio.png
deleted file mode 100644
index ba122a1f..00000000
Binary files a/docs/source/_static/images/CometNativeExecution.drawio.png and
/dev/null differ
diff --git a/docs/source/_static/images/CometNativeParquetReader.drawio
b/docs/source/_static/images/CometNativeParquetReader.drawio
new file mode 100644
index 00000000..0c7304ef
--- /dev/null
+++ b/docs/source/_static/images/CometNativeParquetReader.drawio
@@ -0,0 +1,100 @@
+<mxfile host="app.diagrams.net" agent="Mozilla/5.0 (Macintosh; Intel Mac OS X
10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.6 Safari/605.1.15"
version="24.7.16">
+ <diagram name="Page-1" id="IdYZ_KFENTEXElLiOEKC">
+ <mxGraphModel dx="1133" dy="729" grid="1" gridSize="10" guides="1"
tooltips="1" connect="1" arrows="1" fold="1" page="1" pageScale="1"
pageWidth="850" pageHeight="1100" math="0" shadow="0">
+ <root>
+ <mxCell id="0" />
+ <mxCell id="1" parent="0" />
+ <mxCell id="wVAZ-YzccNhZugPFJvmi-30" value="Spark Executor"
style="rounded=1;whiteSpace=wrap;html=1;dashed=1;verticalAlign=top;" vertex="1"
parent="1">
+ <mxGeometry x="10" y="40" width="510" height="430" as="geometry" />
+ </mxCell>
+ <mxCell id="AH3lBTSLKK5181iXBnnY-2" value="JVM Code"
style="rounded=1;whiteSpace=wrap;html=1;verticalAlign=top;" parent="1"
vertex="1">
+ <mxGeometry x="30" y="70" width="210" height="380" as="geometry" />
+ </mxCell>
+ <mxCell id="wVAZ-YzccNhZugPFJvmi-24"
style="edgeStyle=orthogonalEdgeStyle;rounded=0;orthogonalLoop=1;jettySize=auto;html=1;exitX=0.75;exitY=1;exitDx=0;exitDy=0;entryX=0.75;entryY=0;entryDx=0;entryDy=0;"
edge="1" parent="1" source="t5OBkkhKOG6cYtw1sPyQ-18"
target="wVAZ-YzccNhZugPFJvmi-13">
+ <mxGeometry relative="1" as="geometry" />
+ </mxCell>
+ <mxCell id="t5OBkkhKOG6cYtw1sPyQ-18" value="Comet Parquet
Reader<div><br></div><div><br></div><div>IO
and Decompression</div>"
style="rounded=1;whiteSpace=wrap;html=1;verticalAlign=top;" parent="1"
vertex="1">
+ <mxGeometry x="45" y="110" width="180" height="100" as="geometry" />
+ </mxCell>
+ <mxCell id="wVAZ-YzccNhZugPFJvmi-1" value="Native Code"
style="rounded=1;whiteSpace=wrap;html=1;verticalAlign=top;" vertex="1"
parent="1">
+ <mxGeometry x="290" y="70" width="210" height="380" as="geometry" />
+ </mxCell>
+ <mxCell id="wVAZ-YzccNhZugPFJvmi-21"
style="edgeStyle=orthogonalEdgeStyle;rounded=0;orthogonalLoop=1;jettySize=auto;html=1;exitX=0;exitY=0.75;exitDx=0;exitDy=0;entryX=1;entryY=0.75;entryDx=0;entryDy=0;"
edge="1" parent="1" source="wVAZ-YzccNhZugPFJvmi-2"
target="wVAZ-YzccNhZugPFJvmi-13">
+ <mxGeometry relative="1" as="geometry" />
+ </mxCell>
+ <mxCell id="wVAZ-YzccNhZugPFJvmi-2" value="Native Execution Plan"
style="rounded=1;whiteSpace=wrap;html=1;verticalAlign=top;" vertex="1"
parent="1">
+ <mxGeometry x="310" y="240" width="170" height="100" as="geometry" />
+ </mxCell>
+ <mxCell id="wVAZ-YzccNhZugPFJvmi-19"
style="edgeStyle=orthogonalEdgeStyle;rounded=0;orthogonalLoop=1;jettySize=auto;html=1;exitX=0;exitY=0.75;exitDx=0;exitDy=0;entryX=1;entryY=0.75;entryDx=0;entryDy=0;"
edge="1" parent="1" source="wVAZ-YzccNhZugPFJvmi-4"
target="t5OBkkhKOG6cYtw1sPyQ-18">
+ <mxGeometry relative="1" as="geometry" />
+ </mxCell>
+ <mxCell id="wVAZ-YzccNhZugPFJvmi-4" value="Parquet Decoding"
style="rounded=1;whiteSpace=wrap;html=1;verticalAlign=top;" vertex="1"
parent="1">
+ <mxGeometry x="305" y="110" width="180" height="100" as="geometry" />
+ </mxCell>
+ <mxCell id="wVAZ-YzccNhZugPFJvmi-6" value=""
style="shape=image;verticalLabelPosition=bottom;labelBackgroundColor=default;verticalAlign=top;aspect=fixed;imageAspect=0;image=data:image/svg+xml,PHN2ZyB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIGZpbGw9Im5vbmUiIHZpZXdCb3g9IjAgMCA4MDEgMTY4IiBoZWlnaHQ9IjE2OCIgd2lkdGg9IjgwMSI+JiN4YTs8ZyBjbGlwLXBhdGg9InVybCgjY2xpcDBfMV8xODEpIj4mI3hhOzxwYXRoIGZpbGw9InVybCgjcGFpbnQwX2xpbmVhcl8xXzE4MSkiIGQ9Ik03Ni4xMjk3IDE2OEM4OC40NTk3IDE2OCA5OS42MDk3IDE1
[...]
+ <mxGeometry x="323.48" y="273.6" width="143.03" height="30"
as="geometry" />
+ </mxCell>
+ <mxCell id="wVAZ-YzccNhZugPFJvmi-7" value=""
style="shape=image;verticalLabelPosition=bottom;labelBackgroundColor=default;verticalAlign=top;aspect=fixed;imageAspect=0;image=data:image/png,iVBORw0KGgoAAAANSUhEUgAABwgAAAOoCAMAAADyHlBJAAADAFBMVEUAAAABAQECAgIDAwMEBAQFBQUGBgYHBwcICAgJCQkKCgoLCwsMDAwNDQ0ODg4PDw8QEBARERESEhITExMUFBQVFRUWFhYXFxcYGBgZGRkaGhobGxscHBwdHR0eHh4fHx8gICAhISEiIiIjIyMkJCQlJSUmJiYnJycoKCgpKSkqKiorKyssLCwtLS0uLi4vLy8wMDAxMTEyMjIzMzM0NDQ1NTU2NjY3Nzc4ODg5OTk6Ojo7Ozs8
[...]
+ <mxGeometry x="360" y="303.6" width="70" height="36.4" as="geometry"
/>
+ </mxCell>
+ <mxCell id="wVAZ-YzccNhZugPFJvmi-10" value=""
style="shape=flexArrow;endArrow=classic;html=1;rounded=0;exitX=0.5;exitY=1;exitDx=0;exitDy=0;entryX=0.5;entryY=0;entryDx=0;entryDy=0;"
edge="1" parent="1">
+ <mxGeometry width="50" height="50" relative="1" as="geometry">
+ <mxPoint x="394.5" y="340" as="sourcePoint" />
+ <mxPoint x="394.5" y="370" as="targetPoint" />
+ </mxGeometry>
+ </mxCell>
+ <mxCell id="wVAZ-YzccNhZugPFJvmi-11" value="Shuffle Files"
style="shape=process;whiteSpace=wrap;html=1;backgroundOutline=1;fillColor=#f5f5f5;fontColor=#333333;strokeColor=#666666;"
vertex="1" parent="1">
+ <mxGeometry x="310" y="370" width="170" height="50" as="geometry" />
+ </mxCell>
+ <mxCell id="wVAZ-YzccNhZugPFJvmi-20"
style="edgeStyle=orthogonalEdgeStyle;rounded=0;orthogonalLoop=1;jettySize=auto;html=1;exitX=1;exitY=0.25;exitDx=0;exitDy=0;entryX=0;entryY=0.25;entryDx=0;entryDy=0;"
edge="1" parent="1" source="wVAZ-YzccNhZugPFJvmi-13"
target="wVAZ-YzccNhZugPFJvmi-2">
+ <mxGeometry relative="1" as="geometry" />
+ </mxCell>
+ <mxCell id="wVAZ-YzccNhZugPFJvmi-28" value="executePlan()"
style="edgeLabel;html=1;align=center;verticalAlign=middle;resizable=0;points=[];"
vertex="1" connectable="0" parent="wVAZ-YzccNhZugPFJvmi-20">
+ <mxGeometry x="-0.1059" y="2" relative="1" as="geometry">
+ <mxPoint y="11" as="offset" />
+ </mxGeometry>
+ </mxCell>
+ <mxCell id="wVAZ-YzccNhZugPFJvmi-23"
style="edgeStyle=orthogonalEdgeStyle;rounded=0;orthogonalLoop=1;jettySize=auto;html=1;exitX=0.25;exitY=0;exitDx=0;exitDy=0;entryX=0.25;entryY=1;entryDx=0;entryDy=0;"
edge="1" parent="1" source="wVAZ-YzccNhZugPFJvmi-13"
target="t5OBkkhKOG6cYtw1sPyQ-18">
+ <mxGeometry relative="1" as="geometry" />
+ </mxCell>
+ <mxCell id="wVAZ-YzccNhZugPFJvmi-25"
style="edgeStyle=orthogonalEdgeStyle;rounded=0;orthogonalLoop=1;jettySize=auto;html=1;exitX=0.75;exitY=1;exitDx=0;exitDy=0;entryX=0.75;entryY=0;entryDx=0;entryDy=0;"
edge="1" parent="1" source="wVAZ-YzccNhZugPFJvmi-13"
target="wVAZ-YzccNhZugPFJvmi-14">
+ <mxGeometry relative="1" as="geometry" />
+ </mxCell>
+ <mxCell id="wVAZ-YzccNhZugPFJvmi-13" value="CometExecIterator"
style="rounded=1;whiteSpace=wrap;html=1;verticalAlign=middle;" vertex="1"
parent="1">
+ <mxGeometry x="45" y="240" width="180" height="100" as="geometry" />
+ </mxCell>
+ <mxCell id="wVAZ-YzccNhZugPFJvmi-22"
style="edgeStyle=orthogonalEdgeStyle;rounded=0;orthogonalLoop=1;jettySize=auto;html=1;exitX=0.25;exitY=0;exitDx=0;exitDy=0;entryX=0.25;entryY=1;entryDx=0;entryDy=0;"
edge="1" parent="1" source="wVAZ-YzccNhZugPFJvmi-14"
target="wVAZ-YzccNhZugPFJvmi-13">
+ <mxGeometry relative="1" as="geometry" />
+ </mxCell>
+ <mxCell id="wVAZ-YzccNhZugPFJvmi-26" value="next()"
style="edgeLabel;html=1;align=center;verticalAlign=middle;resizable=0;points=[];"
vertex="1" connectable="0" parent="wVAZ-YzccNhZugPFJvmi-22">
+ <mxGeometry x="0.0667" y="1" relative="1" as="geometry">
+ <mxPoint x="21" as="offset" />
+ </mxGeometry>
+ </mxCell>
+ <mxCell id="wVAZ-YzccNhZugPFJvmi-14" value="Spark Execution Logic"
style="rounded=1;whiteSpace=wrap;html=1;verticalAlign=middle;" vertex="1"
parent="1">
+ <mxGeometry x="45" y="370" width="180" height="40" as="geometry" />
+ </mxCell>
+ <mxCell id="wVAZ-YzccNhZugPFJvmi-15" value=""
style="shape=image;verticalLabelPosition=bottom;labelBackgroundColor=default;verticalAlign=top;aspect=fixed;imageAspect=0;image=data:image/png,iVBORw0KGgoAAAANSUhEUgAABwgAAAOoCAMAAADyHlBJAAADAFBMVEUAAAABAQECAgIDAwMEBAQFBQUGBgYHBwcICAgJCQkKCgoLCwsMDAwNDQ0ODg4PDw8QEBARERESEhITExMUFBQVFRUWFhYXFxcYGBgZGRkaGhobGxscHBwdHR0eHh4fHx8gICAhISEiIiIjIyMkJCQlJSUmJiYnJycoKCgpKSkqKiorKyssLCwtLS0uLi4vLy8wMDAxMTEyMjIzMzM0NDQ1NTU2NjY3Nzc4ODg5OTk6Ojo7Ozs
[...]
+ <mxGeometry x="360" y="173.60000000000002" width="70" height="36.4"
as="geometry" />
+ </mxCell>
+ <mxCell id="wVAZ-YzccNhZugPFJvmi-16" value=""
style="shape=flexArrow;endArrow=classic;html=1;rounded=0;exitX=0.5;exitY=1;exitDx=0;exitDy=0;entryX=0.5;entryY=0;entryDx=0;entryDy=0;"
edge="1" parent="1">
+ <mxGeometry width="50" height="50" relative="1" as="geometry">
+ <mxPoint x="394.5" y="210" as="sourcePoint" />
+ <mxPoint x="394.5" y="240" as="targetPoint" />
+ </mxGeometry>
+ </mxCell>
+ <mxCell id="wVAZ-YzccNhZugPFJvmi-18"
style="edgeStyle=orthogonalEdgeStyle;rounded=0;orthogonalLoop=1;jettySize=auto;html=1;exitX=1;exitY=0.25;exitDx=0;exitDy=0;entryX=0;entryY=0.25;entryDx=0;entryDy=0;"
edge="1" parent="1" source="t5OBkkhKOG6cYtw1sPyQ-18"
target="wVAZ-YzccNhZugPFJvmi-4">
+ <mxGeometry relative="1" as="geometry" />
+ </mxCell>
+ <mxCell id="wVAZ-YzccNhZugPFJvmi-29" value="decode()"
style="edgeLabel;html=1;align=center;verticalAlign=middle;resizable=0;points=[];"
vertex="1" connectable="0" parent="wVAZ-YzccNhZugPFJvmi-18">
+ <mxGeometry x="-0.025" y="-3" relative="1" as="geometry">
+ <mxPoint y="12" as="offset" />
+ </mxGeometry>
+ </mxCell>
+ <mxCell id="wVAZ-YzccNhZugPFJvmi-27" value="next()"
style="edgeLabel;html=1;align=center;verticalAlign=middle;resizable=0;points=[];"
vertex="1" connectable="0" parent="1">
+ <mxGeometry x="110" y="220" as="geometry" />
+ </mxCell>
+ </root>
+ </mxGraphModel>
+ </diagram>
+</mxfile>
diff --git a/docs/source/_static/images/CometNativeParquetReader.drawio.svg
b/docs/source/_static/images/CometNativeParquetReader.drawio.svg
new file mode 100644
index 00000000..0c1f93c7
--- /dev/null
+++ b/docs/source/_static/images/CometNativeParquetReader.drawio.svg
@@ -0,0 +1,4 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!-- Do not edit this file with editors other than draw.io -->
+<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN"
"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
+<svg xmlns="http://www.w3.org/2000/svg"
xmlns:xlink="http://www.w3.org/1999/xlink" version="1.1" width="511px"
height="431px" viewBox="-0.5 -0.5 511 431" content="<mxfile
host="app.diagrams.net" agent="Mozilla/5.0 (Macintosh; Intel Mac
OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.6
Safari/605.1.15" version="24.7.16" scale="1"
border="0"> <diagram name="Page-1"
id="IdYZ_KFENTEXElLiOEKC&quo [...]
\ No newline at end of file
diff --git a/docs/source/_static/images/CometNativeParquetScan.drawio.png
b/docs/source/_static/images/CometNativeParquetScan.drawio.png
deleted file mode 100644
index 712cbae4..00000000
Binary files a/docs/source/_static/images/CometNativeParquetScan.drawio.png and
/dev/null differ
diff --git a/docs/source/_static/images/CometOverviewDetailed.drawio
b/docs/source/_static/images/CometOverviewDetailed.drawio
new file mode 100644
index 00000000..ff7f4c59
--- /dev/null
+++ b/docs/source/_static/images/CometOverviewDetailed.drawio
@@ -0,0 +1,94 @@
+<mxfile host="app.diagrams.net" agent="Mozilla/5.0 (Macintosh; Intel Mac OS X
10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.6 Safari/605.1.15"
version="24.7.16">
+ <diagram name="Page-1" id="IdYZ_KFENTEXElLiOEKC">
+ <mxGraphModel dx="1193" dy="827" grid="1" gridSize="10" guides="1"
tooltips="1" connect="1" arrows="1" fold="1" page="1" pageScale="1"
pageWidth="850" pageHeight="1100" math="0" shadow="0">
+ <root>
+ <mxCell id="0" />
+ <mxCell id="1" parent="0" />
+ <mxCell id="AH3lBTSLKK5181iXBnnY-2" value="Spark Executor"
style="rounded=1;whiteSpace=wrap;html=1;verticalAlign=top;" parent="1"
vertex="1">
+ <mxGeometry x="290" width="210" height="430" as="geometry" />
+ </mxCell>
+ <mxCell id="AH3lBTSLKK5181iXBnnY-16" value="Spark Driver"
style="rounded=1;whiteSpace=wrap;html=1;verticalAlign=top;" parent="1"
vertex="1">
+ <mxGeometry y="40" width="200" height="350" as="geometry" />
+ </mxCell>
+ <mxCell id="AH3lBTSLKK5181iXBnnY-17" value=""
style="shape=image;verticalLabelPosition=bottom;labelBackgroundColor=default;verticalAlign=top;aspect=fixed;imageAspect=0;image=data:image/svg+xml,PHN2ZyB4bWxuczp4bGluaz0iaHR0cDovL3d3dy53My5vcmcvMTk5OS94bGluayIgeG1sbnM9Imh0dHA6Ly93d3cudzMub3JnLzIwMDAvc3ZnIiBzdHlsZT0iZmlsbC1ydWxlOmV2ZW5vZGQ7Y2xpcC1ydWxlOmV2ZW5vZGQ7c3Ryb2tlLWxpbmVqb2luOnJvdW5kO3N0cm9rZS1taXRlcmxpbWl0OjI7IiB4bWw6c3BhY2U9InByZXNlcnZlIiB2ZXJzaW9uPSIxLjEiIHZpZXdCb3g9IjAgMCA
[...]
+ <mxGeometry x="34.519999999999996" y="200" width="125.48"
height="30" as="geometry" />
+ </mxCell>
+ <mxCell id="t5OBkkhKOG6cYtw1sPyQ-1" value="Spark Logical Plan"
style="rounded=1;whiteSpace=wrap;html=1;" vertex="1" parent="1">
+ <mxGeometry x="10" y="80" width="180" height="30" as="geometry" />
+ </mxCell>
+ <mxCell id="t5OBkkhKOG6cYtw1sPyQ-2" value="Spark Physical Plan"
style="rounded=1;whiteSpace=wrap;html=1;" vertex="1" parent="1">
+ <mxGeometry x="10" y="140" width="180" height="30" as="geometry" />
+ </mxCell>
+ <mxCell id="t5OBkkhKOG6cYtw1sPyQ-3" value="Comet Physical Plan"
style="rounded=1;whiteSpace=wrap;html=1;verticalAlign=top;" vertex="1"
parent="1">
+ <mxGeometry x="10" y="260" width="180" height="100" as="geometry" />
+ </mxCell>
+ <mxCell id="t5OBkkhKOG6cYtw1sPyQ-4" value="protobuf intermediate
representation"
style="shape=process;whiteSpace=wrap;html=1;backgroundOutline=1;fillColor=#f5f5f5;fontColor=#333333;strokeColor=#666666;"
vertex="1" parent="1">
+ <mxGeometry x="40" y="290" width="120" height="50" as="geometry" />
+ </mxCell>
+ <mxCell id="t5OBkkhKOG6cYtw1sPyQ-12" value=""
style="shape=flexArrow;endArrow=classic;html=1;rounded=0;exitX=0.5;exitY=1;exitDx=0;exitDy=0;entryX=0.5;entryY=0;entryDx=0;entryDy=0;"
edge="1" parent="1" source="t5OBkkhKOG6cYtw1sPyQ-1"
target="t5OBkkhKOG6cYtw1sPyQ-2">
+ <mxGeometry width="50" height="50" relative="1" as="geometry">
+ <mxPoint x="270" y="270" as="sourcePoint" />
+ <mxPoint x="320" y="220" as="targetPoint" />
+ </mxGeometry>
+ </mxCell>
+ <mxCell id="t5OBkkhKOG6cYtw1sPyQ-13" value=""
style="shape=flexArrow;endArrow=classic;html=1;rounded=0;exitX=0.5;exitY=1;exitDx=0;exitDy=0;entryX=0.5;entryY=0;entryDx=0;entryDy=0;"
edge="1" parent="1">
+ <mxGeometry width="50" height="50" relative="1" as="geometry">
+ <mxPoint x="96.75999999999999" y="170" as="sourcePoint" />
+ <mxPoint x="96.75999999999999" y="200" as="targetPoint" />
+ </mxGeometry>
+ </mxCell>
+ <mxCell id="t5OBkkhKOG6cYtw1sPyQ-14" value=""
style="shape=flexArrow;endArrow=classic;html=1;rounded=0;exitX=0.5;exitY=1;exitDx=0;exitDy=0;entryX=0.5;entryY=0;entryDx=0;entryDy=0;"
edge="1" parent="1">
+ <mxGeometry width="50" height="50" relative="1" as="geometry">
+ <mxPoint x="96.75999999999999" y="230" as="sourcePoint" />
+ <mxPoint x="96.75999999999999" y="260" as="targetPoint" />
+ </mxGeometry>
+ </mxCell>
+ <mxCell id="t5OBkkhKOG6cYtw1sPyQ-15" value=""
style="shape=flexArrow;endArrow=classic;html=1;rounded=0;endWidth=28;endSize=9.67;width=11;fillColor=#000000;"
edge="1" parent="1">
+ <mxGeometry width="50" height="50" relative="1" as="geometry">
+ <mxPoint x="200" y="204.5" as="sourcePoint" />
+ <mxPoint x="290" y="204.5" as="targetPoint" />
+ </mxGeometry>
+ </mxCell>
+ <mxCell id="t5OBkkhKOG6cYtw1sPyQ-16" value="Native Execution Plan"
style="rounded=1;whiteSpace=wrap;html=1;verticalAlign=top;" vertex="1"
parent="1">
+ <mxGeometry x="310" y="230" width="170" height="100" as="geometry" />
+ </mxCell>
+ <mxCell id="t5OBkkhKOG6cYtw1sPyQ-17" value=""
style="shape=image;verticalLabelPosition=bottom;labelBackgroundColor=default;verticalAlign=top;aspect=fixed;imageAspect=0;image=data:image/svg+xml,PHN2ZyB4bWxuczp4bGluaz0iaHR0cDovL3d3dy53My5vcmcvMTk5OS94bGluayIgeG1sbnM9Imh0dHA6Ly93d3cudzMub3JnLzIwMDAvc3ZnIiBzdHlsZT0iZmlsbC1ydWxlOmV2ZW5vZGQ7Y2xpcC1ydWxlOmV2ZW5vZGQ7c3Ryb2tlLWxpbmVqb2luOnJvdW5kO3N0cm9rZS1taXRlcmxpbWl0OjI7IiB4bWw6c3BhY2U9InByZXNlcnZlIiB2ZXJzaW9uPSIxLjEiIHZpZXdCb3g9IjAgMCA
[...]
+ <mxGeometry x="332.26" y="170" width="125.48" height="30"
as="geometry" />
+ </mxCell>
+ <mxCell id="t5OBkkhKOG6cYtw1sPyQ-18" value="Comet Physical Plan"
style="rounded=1;whiteSpace=wrap;html=1;verticalAlign=top;" vertex="1"
parent="1">
+ <mxGeometry x="305" y="40" width="180" height="100" as="geometry" />
+ </mxCell>
+ <mxCell id="t5OBkkhKOG6cYtw1sPyQ-19" value="protobuf intermediate
representation"
style="shape=process;whiteSpace=wrap;html=1;backgroundOutline=1;fillColor=#f5f5f5;fontColor=#333333;strokeColor=#666666;"
vertex="1" parent="1">
+ <mxGeometry x="335" y="70" width="120" height="50" as="geometry" />
+ </mxCell>
+ <mxCell id="t5OBkkhKOG6cYtw1sPyQ-20" value=""
style="shape=image;verticalLabelPosition=bottom;labelBackgroundColor=default;verticalAlign=top;aspect=fixed;imageAspect=0;image=data:image/svg+xml,PHN2ZyB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIGZpbGw9Im5vbmUiIHZpZXdCb3g9IjAgMCA4MDEgMTY4IiBoZWlnaHQ9IjE2OCIgd2lkdGg9IjgwMSI+JiN4YTs8ZyBjbGlwLXBhdGg9InVybCgjY2xpcDBfMV8xODEpIj4mI3hhOzxwYXRoIGZpbGw9InVybCgjcGFpbnQwX2xpbmVhcl8xXzE4MSkiIGQ9Ik03Ni4xMjk3IDE2OEM4OC40NTk3IDE2OCA5OS42MDk3IDE
[...]
+ <mxGeometry x="323.48" y="263.6" width="143.03" height="30"
as="geometry" />
+ </mxCell>
+ <mxCell id="t5OBkkhKOG6cYtw1sPyQ-21" value=""
style="shape=image;verticalLabelPosition=bottom;labelBackgroundColor=default;verticalAlign=top;aspect=fixed;imageAspect=0;image=data:image/png,iVBORw0KGgoAAAANSUhEUgAABwgAAAOoCAMAAADyHlBJAAADAFBMVEUAAAABAQECAgIDAwMEBAQFBQUGBgYHBwcICAgJCQkKCgoLCwsMDAwNDQ0ODg4PDw8QEBARERESEhITExMUFBQVFRUWFhYXFxcYGBgZGRkaGhobGxscHBwdHR0eHh4fHx8gICAhISEiIiIjIyMkJCQlJSUmJiYnJycoKCgpKSkqKiorKyssLCwtLS0uLi4vLy8wMDAxMTEyMjIzMzM0NDQ1NTU2NjY3Nzc4ODg5OTk6Ojo7Ozs
[...]
+ <mxGeometry x="360" y="293.6" width="70" height="36.4" as="geometry"
/>
+ </mxCell>
+ <mxCell id="t5OBkkhKOG6cYtw1sPyQ-22" value=""
style="shape=flexArrow;endArrow=classic;html=1;rounded=0;exitX=0.5;exitY=1;exitDx=0;exitDy=0;entryX=0.5;entryY=0;entryDx=0;entryDy=0;"
edge="1" parent="1">
+ <mxGeometry width="50" height="50" relative="1" as="geometry">
+ <mxPoint x="394.5" y="140" as="sourcePoint" />
+ <mxPoint x="394.5" y="170" as="targetPoint" />
+ </mxGeometry>
+ </mxCell>
+ <mxCell id="t5OBkkhKOG6cYtw1sPyQ-23" value=""
style="shape=flexArrow;endArrow=classic;html=1;rounded=0;exitX=0.5;exitY=1;exitDx=0;exitDy=0;entryX=0.5;entryY=0;entryDx=0;entryDy=0;"
edge="1" parent="1" source="t5OBkkhKOG6cYtw1sPyQ-17"
target="t5OBkkhKOG6cYtw1sPyQ-16">
+ <mxGeometry width="50" height="50" relative="1" as="geometry">
+ <mxPoint x="140" y="210" as="sourcePoint" />
+ <mxPoint x="140" y="240" as="targetPoint" />
+ </mxGeometry>
+ </mxCell>
+ <mxCell id="t5OBkkhKOG6cYtw1sPyQ-24" value=""
style="shape=flexArrow;endArrow=classic;html=1;rounded=0;exitX=0.5;exitY=1;exitDx=0;exitDy=0;entryX=0.5;entryY=0;entryDx=0;entryDy=0;"
edge="1" parent="1">
+ <mxGeometry width="50" height="50" relative="1" as="geometry">
+ <mxPoint x="394.5" y="330" as="sourcePoint" />
+ <mxPoint x="394.5" y="360" as="targetPoint" />
+ </mxGeometry>
+ </mxCell>
+ <mxCell id="t5OBkkhKOG6cYtw1sPyQ-25" value="Shuffle Files"
style="shape=process;whiteSpace=wrap;html=1;backgroundOutline=1;fillColor=#f5f5f5;fontColor=#333333;strokeColor=#666666;"
vertex="1" parent="1">
+ <mxGeometry x="310" y="360" width="170" height="50" as="geometry" />
+ </mxCell>
+ </root>
+ </mxGraphModel>
+ </diagram>
+</mxfile>
diff --git a/docs/source/_static/images/CometOverviewDetailed.drawio.svg
b/docs/source/_static/images/CometOverviewDetailed.drawio.svg
new file mode 100644
index 00000000..0f29083b
--- /dev/null
+++ b/docs/source/_static/images/CometOverviewDetailed.drawio.svg
@@ -0,0 +1,4 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!-- Do not edit this file with editors other than draw.io -->
+<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN"
"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
+<svg xmlns="http://www.w3.org/2000/svg"
xmlns:xlink="http://www.w3.org/1999/xlink" version="1.1" width="501px"
height="431px" viewBox="-0.5 -0.5 501 431" content="<mxfile
host="app.diagrams.net" agent="Mozilla/5.0 (Macintosh; Intel Mac
OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.6
Safari/605.1.15" version="24.7.16" scale="1"
border="0"> <diagram name="Page-1"
id="IdYZ_KFENTEXElLiOEKC&quo [...]
\ No newline at end of file
diff --git a/docs/source/contributor-guide/plugin_overview.md
b/docs/source/contributor-guide/plugin_overview.md
index c7538290..a211ca6b 100644
--- a/docs/source/contributor-guide/plugin_overview.md
+++ b/docs/source/contributor-guide/plugin_overview.md
@@ -79,10 +79,10 @@ The leaf nodes in the physical plan are always `ScanExec`
and these operators co
prepared before the plan is executed. When `CometExecIterator` invokes
`Native.executePlan` it passes the memory
addresses of these Arrow arrays to the native code.
-
+
## End to End Flow
The following diagram shows the end-to-end flow.
-
+
diff --git a/docs/source/index.rst b/docs/source/index.rst
index 4bf5d9fd..39ad27a5 100644
--- a/docs/source/index.rst
+++ b/docs/source/index.rst
@@ -42,6 +42,8 @@ as a native runtime to achieve improvement in terms of query
efficiency and quer
Comet Overview <user-guide/overview>
Installing Comet <user-guide/installation>
+ Building From Source <user-guide/source>
+ Kubernetes Guide <user-guide/kubernetes>
Supported Data Sources <user-guide/datasources>
Supported Data Types <user-guide/datatypes>
Supported Operators <user-guide/operators>
diff --git a/docs/source/user-guide/installation.md
b/docs/source/user-guide/installation.md
index dc4429b8..343b6586 100644
--- a/docs/source/user-guide/installation.md
+++ b/docs/source/user-guide/installation.md
@@ -19,73 +19,54 @@
# Installing DataFusion Comet
+## Prerequisites
+
Make sure the following requirements are met and software installed on your
machine.
-## Supported Platforms
+### Supported Operating Systems
- Linux
- Apple OSX (Intel and Apple Silicon)
-## Requirements
+### Supported Spark Versions
-- [Apache Spark supported by
Comet](overview.md#supported-apache-spark-versions)
-- JDK 8 and up
-- GLIBC 2.17 (Centos 7) and up
+Comet currently supports the following versions of Apache Spark:
-## Deploying to Kubernetes
+- 3.3.x (Java 8/11/17, Scala 2.12/2.13)
+- 3.4.x (Java 8/11/17, Scala 2.12/2.13)
+- 3.5.x (Java 8/11/17, Scala 2.12/2.13)
-See the [Comet Kubernetes Guide](kubernetes.md) guide.
-
-## Using a Published JAR File
+Experimental support is provided for the following versions of Apache Spark
and is intended for development/testing
+use only and should not be used in production yet.
-Pre-built jar files are available in Maven central at
https://central.sonatype.com/namespace/org.apache.datafusion
+- 4.0.0-preview1 (Java 17/21, Scala 2.13)
-## Using a Published Source Release
-
-Official source releases can be downloaded from
https://dist.apache.org/repos/dist/release/datafusion/
-
-```console
-# Pick the latest version
-export COMET_VERSION=0.3.0
-# Download the tarball
-curl -O
"https://dist.apache.org/repos/dist/release/datafusion/datafusion-comet-$COMET_VERSION/apache-datafusion-comet-$COMET_VERSION.tar.gz"
-# Unpack
-tar -xzf apache-datafusion-comet-$COMET_VERSION.tar.gz
-cd apache-datafusion-comet-$COMET_VERSION
-```
+Note that Comet may not fully work with proprietary forks of Apache Spark such
as the Spark versions offered by
+Cloud Service Providers.
-Build
-
-```console
-make release-nogit PROFILES="-Pspark-3.4"
-```
-
-## Building from the GitHub repository
+## Using a Published JAR File
-Clone the repository:
+Comet jar files are available in [Maven
Central](https://central.sonatype.com/namespace/org.apache.datafusion).
-```console
-git clone https://github.com/apache/datafusion-comet.git
-```
+Here are the direct links for downloading the Comet jar file.
-Build Comet for a specific Spark version:
+- [Comet plugin for Spark 3.3 / Scala
2.12](https://repo1.maven.org/maven2/org/apache/datafusion/comet-spark-spark3.3_2.12/0.3.0/comet-spark-spark3.3_2.12-0.3.0.jar)
+- [Comet plugin for Spark 3.3 / Scala
2.13](https://repo1.maven.org/maven2/org/apache/datafusion/comet-spark-spark3.3_2.13/0.3.0/comet-spark-spark3.3_2.13-0.3.0.jar)
+- [Comet plugin for Spark 3.4 / Scala
2.12](https://repo1.maven.org/maven2/org/apache/datafusion/comet-spark-spark3.4_2.12/0.3.0/comet-spark-spark3.4_2.12-0.3.0.jar)
+- [Comet plugin for Spark 3.4 / Scala
2.13](https://repo1.maven.org/maven2/org/apache/datafusion/comet-spark-spark3.4_2.13/0.3.0/comet-spark-spark3.4_2.13-0.3.0.jar)
+- [Comet plugin for Spark 3.5 / Scala
2.12](https://repo1.maven.org/maven2/org/apache/datafusion/comet-spark-spark3.5_2.12/0.3.0/comet-spark-spark3.5_2.12-0.3.0.jar)
+- [Comet plugin for Spark 3.5 / Scala
2.13](https://repo1.maven.org/maven2/org/apache/datafusion/comet-spark-spark3.5_2.13/0.3.0/comet-spark-spark3.5_2.13-0.3.0.jar)
-```console
-cd datafusion-comet
-make release PROFILES="-Pspark-3.4"
-```
+## Building from source
-Note that the project builds for Scala 2.12 by default but can be built for
Scala 2.13 using an additional profile:
+Refer to the [Building from Source] guide for instructions from building Comet
from source, either from official
+source releases, or from the latest code in the GitHub repository.
-```console
-make release PROFILES="-Pspark-3.4 -Pscala-2.13"
-```
+[Building from Source]: source.md
-To build Comet from the source distribution on an isolated environment without
an access to `github.com` it is necessary to disable
`git-commit-id-maven-plugin`, otherwise you will face errors that there is no
access to the git during the build process. In that case you may use:
+## Deploying to Kubernetes
-```console
-make release-nogit PROFILES="-Pspark-3.4"
-```
+See the [Comet Kubernetes Guide](kubernetes.md) guide.
## Run Spark Shell with Comet enabled
@@ -99,11 +80,10 @@ $SPARK_HOME/bin/spark-shell \
--conf spark.driver.extraClassPath=$COMET_JAR \
--conf spark.executor.extraClassPath=$COMET_JAR \
--conf spark.plugins=org.apache.spark.CometPlugin \
- --conf spark.comet.enabled=true \
- --conf spark.comet.exec.enabled=true \
+ --conf
spark.shuffle.manager=org.apache.spark.sql.comet.execution.shuffle.CometShuffleManager
--conf spark.comet.explainFallback.enabled=true \
- --conf spark.driver.memory=1g \
- --conf spark.executor.memory=1g
+ --conf spark.memory.offHeap.enabled=true \
+ --conf spark.memory.offHeap.size=16g \
```
### Verify Comet enabled for Spark SQL query
@@ -142,20 +122,9 @@ WARN CometSparkSessionExtensions$CometExecRule: Comet
cannot execute some parts
- Execute InsertIntoHadoopFsRelationCommand is not supported
```
-### Enable Comet shuffle
+## Additional Configuration
-Comet shuffle feature is disabled by default. To enable it, please add related
configs:
-
-```
---conf
spark.shuffle.manager=org.apache.spark.sql.comet.execution.shuffle.CometShuffleManager
---conf spark.comet.exec.shuffle.enabled=true
-```
-
-Above configs enable Comet native shuffle which only supports hash partition
and single partition.
-Comet native shuffle doesn't support complex types yet.
-
-Comet doesn't have official release yet so currently the only way to test it
is to build jar and include it in your
-Spark application. Depending on your deployment mode you may also need to set
the driver & executor class path(s) to
+Depending on your deployment mode you may also need to set the driver &
executor class path(s) to
explicitly contain Comet otherwise Spark may use a different class-loader for
the Comet components than its internal
components which will then fail at runtime. For example:
@@ -165,11 +134,7 @@ components which will then fail at runtime. For example:
Some cluster managers may require additional configuration, see
<https://spark.apache.org/docs/latest/cluster-overview.html>
-To enable columnar shuffle which supports all partitioning and basic complex
types, one more config is required:
-
-```
---conf spark.comet.exec.shuffle.mode=jvm
-```
-
### Memory tuning
-In addition to Apache Spark memory configuration parameters the Comet
introduces own parameters to configure memory allocation for native execution.
More [Comet Memory Tuning](./tuning.md)
+
+In addition to Apache Spark memory configuration parameters, Comet introduces
additional parameters to configure memory
+allocation for native execution. See [Comet Memory Tuning](./tuning.md) for
details.
diff --git a/docs/source/user-guide/overview.md
b/docs/source/user-guide/overview.md
index e386aec8..92dfe2bb 100644
--- a/docs/source/user-guide/overview.md
+++ b/docs/source/user-guide/overview.md
@@ -19,8 +19,14 @@
# Comet Overview
-Comet runs Spark SQL queries using the native Apache DataFusion runtime, which
is
-typically faster and more resource efficient than JVM based runtimes.
+Apache DataFusion Comet is a high-performance accelerator for Apache Spark,
built on top of the powerful
+[Apache DataFusion] query engine. Comet is designed to significantly enhance
the
+performance of Apache Spark workloads while leveraging commodity hardware and
seamlessly integrating with the
+Spark ecosystem without requiring any code changes.
+
+[Apache DataFusion]: https://datafusion.apache.org
+
+The following diagram provides an overview of Comet's architecture.

@@ -34,26 +40,10 @@ Comet aims to support:
## Architecture
-The following diagram illustrates the architecture of Comet:
+The following diagram shows how Comet integrates with Apache Spark.

-## Supported Apache Spark versions
-
-Comet currently supports the following versions of Apache Spark:
-
-- 3.3.x
-- 3.4.x
-- 3.5.x
-
-Experimental support is provided for the following versions of Apache Spark
and is intended for development/testing
-use only and should not be used in production yet.
-
-- 4.0.0-preview1
-
-Note that Comet may not fully work with proprietary forks of Apache Spark such
as the Spark versions offered by
-Cloud Service Providers.
-
## Feature Parity with Apache Spark
The project strives to keep feature parity with Apache Spark, that is,
@@ -65,3 +55,9 @@ features and fallback to Spark engine.
To achieve this, besides unit tests within Comet itself, we also re-use
Spark SQL tests and make sure they all pass with Comet extension
enabled.
+
+## Getting Started
+
+Refer to the [Comet Installation Guide] to get started.
+
+[Comet Installation Guide]: installation.md
diff --git a/docs/source/user-guide/source.md b/docs/source/user-guide/source.md
new file mode 100644
index 00000000..71c9060c
--- /dev/null
+++ b/docs/source/user-guide/source.md
@@ -0,0 +1,69 @@
+<!---
+ Licensed to the Apache Software Foundation (ASF) under one
+ or more contributor license agreements. See the NOTICE file
+ distributed with this work for additional information
+ regarding copyright ownership. The ASF licenses this file
+ to you under the Apache License, Version 2.0 (the
+ "License"); you may not use this file except in compliance
+ with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing,
+ software distributed under the License is distributed on an
+ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ KIND, either express or implied. See the License for the
+ specific language governing permissions and limitations
+ under the License.
+-->
+
+# Building Comet From Source
+
+It is sometimes preferable to build from source for a specific platform.
+
+## Using a Published Source Release
+
+Official source releases can be downloaded from
https://dist.apache.org/repos/dist/release/datafusion/
+
+```console
+# Pick the latest version
+export COMET_VERSION=0.3.0
+# Download the tarball
+curl -O
"https://dist.apache.org/repos/dist/release/datafusion/datafusion-comet-$COMET_VERSION/apache-datafusion-comet-$COMET_VERSION.tar.gz"
+# Unpack
+tar -xzf apache-datafusion-comet-$COMET_VERSION.tar.gz
+cd apache-datafusion-comet-$COMET_VERSION
+```
+
+Build
+
+```console
+make release-nogit PROFILES="-Pspark-3.4"
+```
+
+## Building from the GitHub repository
+
+Clone the repository:
+
+```console
+git clone https://github.com/apache/datafusion-comet.git
+```
+
+Build Comet for a specific Spark version:
+
+```console
+cd datafusion-comet
+make release PROFILES="-Pspark-3.4"
+```
+
+Note that the project builds for Scala 2.12 by default but can be built for
Scala 2.13 using an additional profile:
+
+```console
+make release PROFILES="-Pspark-3.4 -Pscala-2.13"
+```
+
+To build Comet from the source distribution on an isolated environment without
an access to `github.com` it is necessary to disable
`git-commit-id-maven-plugin`, otherwise you will face errors that there is no
access to the git during the build process. In that case you may use:
+
+```console
+make release-nogit PROFILES="-Pspark-3.4"
+```
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]