This is an automated email from the ASF dual-hosted git repository.

sunchao pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/datafusion-comet.git


The following commit(s) were added to refs/heads/main by this push:
     new 686c7ade docs: Add a plugin overview page to the contributors guide 
(#345)
686c7ade is described below

commit 686c7adeb44944896981853ad3277291fcafe50b
Author: Andy Grove <[email protected]>
AuthorDate: Tue Apr 30 22:20:23 2024 -0600

    docs: Add a plugin overview page to the contributors guide (#345)
---
 docs/source/contributor-guide/debugging.md       |  2 +-
 docs/source/contributor-guide/plugin_overview.md | 59 ++++++++++++++++++++++++
 docs/source/index.rst                            |  5 +-
 3 files changed, 63 insertions(+), 3 deletions(-)

diff --git a/docs/source/contributor-guide/debugging.md 
b/docs/source/contributor-guide/debugging.md
index 3b20ed0b..38c396c1 100644
--- a/docs/source/contributor-guide/debugging.md
+++ b/docs/source/contributor-guide/debugging.md
@@ -99,7 +99,7 @@ 
https://mail.openjdk.org/pipermail/hotspot-dev/2019-September/039429.html
 Detecting the debugger
 
https://stackoverflow.com/questions/5393403/can-a-java-application-detect-that-a-debugger-is-attached#:~:text=No.,to%20let%20your%20app%20continue.&text=I%20know%20that%20those%20are,meant%20with%20my%20first%20phrase).
 
-# Verbose debug
+## Verbose debug
 
 By default, Comet outputs the exception details specific for Comet.
 
diff --git a/docs/source/contributor-guide/plugin_overview.md 
b/docs/source/contributor-guide/plugin_overview.md
new file mode 100644
index 00000000..8b48818e
--- /dev/null
+++ b/docs/source/contributor-guide/plugin_overview.md
@@ -0,0 +1,59 @@
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# Comet Plugin Overview
+
+The entry point to Comet is the `org.apache.comet.CometSparkSessionExtensions` 
class, which can be registered with Spark by adding the following setting to 
the Spark configuration when launching `spark-shell` or `spark-submit`:
+
+```
+--conf spark.sql.extensions=org.apache.comet.CometSparkSessionExtensions
+```
+
+On initialization, this class registers two physical plan optimization rules 
with Spark: `CometScanRule` and `CometExecRule`. These rules run whenever a 
query stage is being planned.
+
+## CometScanRule
+
+`CometScanRule` replaces any Parquet scans with Comet Parquet scan classes.
+
+When the V1 data source API is being used, `FileSourceScanExec` is replaced 
with `CometScanExec`.
+
+When the V2 data source API is being used, `BatchScanExec` is replaced with 
`CometBatchScanExec`.
+
+## CometExecRule
+
+`CometExecRule` attempts to transform a Spark physical plan into a Comet plan. 
This rule is executed against
+individual query stages when they are being prepared for execution.
+
+This rule traverses bottom-up from the original Spark plan and attempts to 
replace each node with a Comet equivalent.
+For example, a `ProjectExec` will be replaced by `CometProjectExec`.
+
+When replacing a node, various checks are performed to determine if Comet can 
support the operator and its expressions.
+If an operator, expression, or data type is not supported by Comet then the 
reason will be stored in a tag on the
+underlying Spark node and the plan will not be converted.
+
+Comet does not support partially replacing subsets of the plan within a query 
stage because this would involve adding
+transitions to convert between row-based and columnar data between Spark 
operators and Comet operators and the overhead
+of this could outweigh the benefits of running parts of the query stage 
natively in Comet.
+
+Once the plan has been transformed, it is serialized into Comet protocol 
buffer format by the `QueryPlanSerde` class
+and this serialized plan is passed into the native code by `CometExecIterator`.
+
+In the native code there is a `PhysicalPlanner` struct (in `planner.rs`) which 
converts the serialized plan into an
+Apache DataFusion physical plan. In some cases, Comet provides specialized 
physical operators and expressions to
+override the DataFusion versions to ensure compatibility with Apache Spark.
diff --git a/docs/source/index.rst b/docs/source/index.rst
index dfd19e59..5759fcf4 100644
--- a/docs/source/index.rst
+++ b/docs/source/index.rst
@@ -54,9 +54,10 @@ as a native runtime to achieve improvement in terms of query 
efficiency and quer
    :caption: Contributor Guide
 
    Getting Started <contributor-guide/contributing>
+   Comet Plugin Overview <contributor-guide/plugin_overview>
+   Development Guide <contributor-guide/development>
+   Debugging Guide <contributor-guide/debugging>
    Github and Issue Tracker <https://github.com/apache/datafusion-comet>
-   contributor-guide/development
-   contributor-guide/debugging
 
 .. _toc.asf-links:
 .. toctree::


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to