This is an automated email from the ASF dual-hosted git repository.
sunchao pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/datafusion-comet.git
The following commit(s) were added to refs/heads/main by this push:
new 686c7ade docs: Add a plugin overview page to the contributors guide
(#345)
686c7ade is described below
commit 686c7adeb44944896981853ad3277291fcafe50b
Author: Andy Grove <[email protected]>
AuthorDate: Tue Apr 30 22:20:23 2024 -0600
docs: Add a plugin overview page to the contributors guide (#345)
---
docs/source/contributor-guide/debugging.md | 2 +-
docs/source/contributor-guide/plugin_overview.md | 59 ++++++++++++++++++++++++
docs/source/index.rst | 5 +-
3 files changed, 63 insertions(+), 3 deletions(-)
diff --git a/docs/source/contributor-guide/debugging.md
b/docs/source/contributor-guide/debugging.md
index 3b20ed0b..38c396c1 100644
--- a/docs/source/contributor-guide/debugging.md
+++ b/docs/source/contributor-guide/debugging.md
@@ -99,7 +99,7 @@
https://mail.openjdk.org/pipermail/hotspot-dev/2019-September/039429.html
Detecting the debugger
https://stackoverflow.com/questions/5393403/can-a-java-application-detect-that-a-debugger-is-attached#:~:text=No.,to%20let%20your%20app%20continue.&text=I%20know%20that%20those%20are,meant%20with%20my%20first%20phrase).
-# Verbose debug
+## Verbose debug
By default, Comet outputs the exception details specific for Comet.
diff --git a/docs/source/contributor-guide/plugin_overview.md
b/docs/source/contributor-guide/plugin_overview.md
new file mode 100644
index 00000000..8b48818e
--- /dev/null
+++ b/docs/source/contributor-guide/plugin_overview.md
@@ -0,0 +1,59 @@
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements. See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership. The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied. See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# Comet Plugin Overview
+
+The entry point to Comet is the `org.apache.comet.CometSparkSessionExtensions`
class, which can be registered with Spark by adding the following setting to
the Spark configuration when launching `spark-shell` or `spark-submit`:
+
+```
+--conf spark.sql.extensions=org.apache.comet.CometSparkSessionExtensions
+```
+
+On initialization, this class registers two physical plan optimization rules
with Spark: `CometScanRule` and `CometExecRule`. These rules run whenever a
query stage is being planned.
+
+## CometScanRule
+
+`CometScanRule` replaces any Parquet scans with Comet Parquet scan classes.
+
+When the V1 data source API is being used, `FileSourceScanExec` is replaced
with `CometScanExec`.
+
+When the V2 data source API is being used, `BatchScanExec` is replaced with
`CometBatchScanExec`.
+
+## CometExecRule
+
+`CometExecRule` attempts to transform a Spark physical plan into a Comet plan.
This rule is executed against
+individual query stages when they are being prepared for execution.
+
+This rule traverses bottom-up from the original Spark plan and attempts to
replace each node with a Comet equivalent.
+For example, a `ProjectExec` will be replaced by `CometProjectExec`.
+
+When replacing a node, various checks are performed to determine if Comet can
support the operator and its expressions.
+If an operator, expression, or data type is not supported by Comet then the
reason will be stored in a tag on the
+underlying Spark node and the plan will not be converted.
+
+Comet does not support partially replacing subsets of the plan within a query
stage because this would involve adding
+transitions to convert between row-based and columnar data between Spark
operators and Comet operators and the overhead
+of this could outweigh the benefits of running parts of the query stage
natively in Comet.
+
+Once the plan has been transformed, it is serialized into Comet protocol
buffer format by the `QueryPlanSerde` class
+and this serialized plan is passed into the native code by `CometExecIterator`.
+
+In the native code there is a `PhysicalPlanner` struct (in `planner.rs`) which
converts the serialized plan into an
+Apache DataFusion physical plan. In some cases, Comet provides specialized
physical operators and expressions to
+override the DataFusion versions to ensure compatibility with Apache Spark.
diff --git a/docs/source/index.rst b/docs/source/index.rst
index dfd19e59..5759fcf4 100644
--- a/docs/source/index.rst
+++ b/docs/source/index.rst
@@ -54,9 +54,10 @@ as a native runtime to achieve improvement in terms of query
efficiency and quer
:caption: Contributor Guide
Getting Started <contributor-guide/contributing>
+ Comet Plugin Overview <contributor-guide/plugin_overview>
+ Development Guide <contributor-guide/development>
+ Debugging Guide <contributor-guide/debugging>
Github and Issue Tracker <https://github.com/apache/datafusion-comet>
- contributor-guide/development
- contributor-guide/debugging
.. _toc.asf-links:
.. toctree::
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]