This is an automated email from the ASF dual-hosted git repository.
github-bot pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/datafusion-comet.git
The following commit(s) were added to refs/heads/asf-site by this push:
new d7c5cbff Publish built docs triggered by
e297d23bd38bc306c90ed21a154d1495f985683e
d7c5cbff is described below
commit d7c5cbff4223e229169a64d0ae118529d4840469
Author: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
AuthorDate: Wed Dec 18 17:50:42 2024 +0000
Publish built docs triggered by e297d23bd38bc306c90ed21a154d1495f985683e
---
_sources/index.rst.txt | 1 +
_sources/user-guide/metrics.md.txt | 66 +++++++
_sources/user-guide/tuning.md.txt | 25 ---
contributor-guide/adding_a_new_expression.html | 5 +
contributor-guide/benchmark-results/tpc-ds.html | 5 +
contributor-guide/benchmark-results/tpc-h.html | 5 +
contributor-guide/benchmarking.html | 5 +
contributor-guide/contributing.html | 11 +-
contributor-guide/debugging.html | 5 +
contributor-guide/development.html | 5 +
contributor-guide/plugin_overview.html | 5 +
contributor-guide/profiling_native_code.html | 5 +
contributor-guide/spark-sql-tests.html | 5 +
genindex.html | 5 +
index.html | 6 +
objects.inv | Bin 773 -> 785 bytes
search.html | 5 +
searchindex.js | 2 +-
user-guide/compatibility.html | 5 +
user-guide/configs.html | 5 +
user-guide/datasources.html | 5 +
user-guide/datatypes.html | 5 +
user-guide/expressions.html | 5 +
user-guide/installation.html | 5 +
user-guide/kubernetes.html | 5 +
user-guide/{overview.html => metrics.html} | 220 +++++++++++++++++-------
user-guide/operators.html | 5 +
user-guide/overview.html | 5 +
user-guide/source.html | 5 +
user-guide/tuning.html | 72 +-------
30 files changed, 349 insertions(+), 159 deletions(-)
diff --git a/_sources/index.rst.txt b/_sources/index.rst.txt
index 39ad27a5..21ec36ca 100644
--- a/_sources/index.rst.txt
+++ b/_sources/index.rst.txt
@@ -51,6 +51,7 @@ as a native runtime to achieve improvement in terms of query
efficiency and quer
Configuration Settings <user-guide/configs>
Compatibility Guide <user-guide/compatibility>
Tuning Guide <user-guide/tuning>
+ Metrics Guide <user-guide/metrics>
.. _toc.contributor-guide-links:
.. toctree::
diff --git a/_sources/user-guide/metrics.md.txt
b/_sources/user-guide/metrics.md.txt
new file mode 100644
index 00000000..509d0ae8
--- /dev/null
+++ b/_sources/user-guide/metrics.md.txt
@@ -0,0 +1,66 @@
+<!---
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements. See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership. The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied. See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# Comet Metrics
+
+## Spark SQL Metrics
+
+Set `spark.comet.metrics.detailed=true` to see all available Comet metrics.
+
+### CometScanExec
+
+| Metric | Description
|
+| ----------- |
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
+| `scan time` | Total time to scan a Parquet file. This is not comparable to
the same metric in Spark because Comet's scan metric is more accurate. Although
both Comet and Spark measure the time in nanoseconds, Spark rounds this time to
the nearest millisecond per batch and Comet does not. |
+
+### Exchange
+
+Comet adds some additional metrics:
+
+| Metric | Description
|
+| ------------------------------- |
------------------------------------------------------------- |
+| `native shuffle time` | Total time in native code excluding any
child operators. |
+| `repartition time` | Time to repartition batches.
|
+| `memory pool time` | Time interacting with memory pool.
|
+| `encoding and compression time` | Time to encode batches in IPC format and
compress using ZSTD. |
+
+## Native Metrics
+
+Setting `spark.comet.explain.native.enabled=true` will cause native plans to
be logged in each executor. Metrics are
+logged for each native plan (and there is one plan per task, so this is very
verbose).
+
+Here is a guide to some of the native metrics.
+
+### ScanExec
+
+| Metric | Description
|
+| ----------------- |
---------------------------------------------------------------------------------------------------
|
+| `elapsed_compute` | Total time spent in this operator, fetching batches from
a JVM iterator. |
+| `jvm_fetch_time` | Time spent in the JVM fetching input batches to be read
by this `ScanExec` instance. |
+| `arrow_ffi_time` | Time spent using Arrow FFI to create Arrow batches from
the memory addresses returned from the JVM. |
+
+### ShuffleWriterExec
+
+| Metric | Description
|
+| ----------------- |
------------------------------------------------------------- |
+| `elapsed_compute` | Total time excluding any child operators.
|
+| `repart_time` | Time to repartition batches.
|
+| `ipc_time` | Time to encode batches in IPC format and compress using
ZSTD. |
+| `mempool_time` | Time interacting with memory pool.
|
+| `write_time` | Time spent writing bytes to disk.
|
diff --git a/_sources/user-guide/tuning.md.txt
b/_sources/user-guide/tuning.md.txt
index f10a0dde..d68481d1 100644
--- a/_sources/user-guide/tuning.md.txt
+++ b/_sources/user-guide/tuning.md.txt
@@ -103,31 +103,6 @@ native shuffle currently only supports `HashPartitioning`
and `SinglePartitionin
To enable native shuffle, set `spark.comet.exec.shuffle.mode` to `native`. If
this mode is explicitly set,
then any shuffle operations that cannot be supported in this mode will fall
back to Spark.
-## Metrics
-
-### Spark SQL Metrics
-
-Some Comet metrics are not directly comparable to Spark metrics in some cases:
-
-- `CometScanExec` uses nanoseconds for total scan time. Spark also measures
scan time in nanoseconds but converts to
- milliseconds _per batch_ which can result in a large loss of precision,
making it difficult to compare scan times
- between Spark and Comet.
-
-### Native Metrics
-
-Setting `spark.comet.explain.native.enabled=true` will cause native plans to
be logged in each executor. Metrics are
-logged for each native plan (and there is one plan per task, so this is very
verbose).
-
-Here is a guide to some of the native metrics.
-
-### ScanExec
-
-| Metric | Description
|
-| ----------------- |
---------------------------------------------------------------------------------------------------
|
-| `elapsed_compute` | Total time spent in this operator, fetching batches from
a JVM iterator. |
-| `jvm_fetch_time` | Time spent in the JVM fetching input batches to be read
by this `ScanExec` instance. |
-| `arrow_ffi_time` | Time spent using Arrow FFI to create Arrow batches from
the memory addresses returned from the JVM. |
-
## Explain Plan
### Extended Explain
With Spark 4.0.0 and newer, Comet can provide extended explain plan
information in the Spark UI. Currently this lists
diff --git a/contributor-guide/adding_a_new_expression.html
b/contributor-guide/adding_a_new_expression.html
index c563c421..9e4cdfe8 100644
--- a/contributor-guide/adding_a_new_expression.html
+++ b/contributor-guide/adding_a_new_expression.html
@@ -163,6 +163,11 @@ under the License.
Tuning Guide
</a>
</li>
+ <li class="toctree-l1">
+ <a class="reference internal" href="../user-guide/metrics.html">
+ Metrics Guide
+ </a>
+ </li>
</ul>
<p aria-level="2" class="caption" role="heading">
<span class="caption-text">
diff --git a/contributor-guide/benchmark-results/tpc-ds.html
b/contributor-guide/benchmark-results/tpc-ds.html
index e553aee3..4be70258 100644
--- a/contributor-guide/benchmark-results/tpc-ds.html
+++ b/contributor-guide/benchmark-results/tpc-ds.html
@@ -161,6 +161,11 @@ under the License.
Tuning Guide
</a>
</li>
+ <li class="toctree-l1">
+ <a class="reference internal" href="../../user-guide/metrics.html">
+ Metrics Guide
+ </a>
+ </li>
</ul>
<p aria-level="2" class="caption" role="heading">
<span class="caption-text">
diff --git a/contributor-guide/benchmark-results/tpc-h.html
b/contributor-guide/benchmark-results/tpc-h.html
index bf15717f..5966c09a 100644
--- a/contributor-guide/benchmark-results/tpc-h.html
+++ b/contributor-guide/benchmark-results/tpc-h.html
@@ -161,6 +161,11 @@ under the License.
Tuning Guide
</a>
</li>
+ <li class="toctree-l1">
+ <a class="reference internal" href="../../user-guide/metrics.html">
+ Metrics Guide
+ </a>
+ </li>
</ul>
<p aria-level="2" class="caption" role="heading">
<span class="caption-text">
diff --git a/contributor-guide/benchmarking.html
b/contributor-guide/benchmarking.html
index 64a77feb..84104d6e 100644
--- a/contributor-guide/benchmarking.html
+++ b/contributor-guide/benchmarking.html
@@ -163,6 +163,11 @@ under the License.
Tuning Guide
</a>
</li>
+ <li class="toctree-l1">
+ <a class="reference internal" href="../user-guide/metrics.html">
+ Metrics Guide
+ </a>
+ </li>
</ul>
<p aria-level="2" class="caption" role="heading">
<span class="caption-text">
diff --git a/contributor-guide/contributing.html
b/contributor-guide/contributing.html
index c599fe37..c38dd007 100644
--- a/contributor-guide/contributing.html
+++ b/contributor-guide/contributing.html
@@ -54,7 +54,7 @@ under the License.
<link rel="index" title="Index" href="../genindex.html" />
<link rel="search" title="Search" href="../search.html" />
<link rel="next" title="Comet Plugin Architecture"
href="plugin_overview.html" />
- <link rel="prev" title="Tuning Guide" href="../user-guide/tuning.html" />
+ <link rel="prev" title="Comet Metrics" href="../user-guide/metrics.html" />
<meta name="viewport" content="width=device-width, initial-scale=1" />
<meta name="docsearch:language" content="en">
@@ -163,6 +163,11 @@ under the License.
Tuning Guide
</a>
</li>
+ <li class="toctree-l1">
+ <a class="reference internal" href="../user-guide/metrics.html">
+ Metrics Guide
+ </a>
+ </li>
</ul>
<p aria-level="2" class="caption" role="heading">
<span class="caption-text">
@@ -384,11 +389,11 @@ coordinate on issues that they are working on.</p>
<!-- Previous / next buttons -->
<div class='prev-next-area'>
- <a class='left-prev' id="prev-link" href="../user-guide/tuning.html"
title="previous page">
+ <a class='left-prev' id="prev-link" href="../user-guide/metrics.html"
title="previous page">
<i class="fas fa-angle-left"></i>
<div class="prev-next-info">
<p class="prev-next-subtitle">previous</p>
- <p class="prev-next-title">Tuning Guide</p>
+ <p class="prev-next-title">Comet Metrics</p>
</div>
</a>
<a class='right-next' id="next-link" href="plugin_overview.html"
title="next page">
diff --git a/contributor-guide/debugging.html b/contributor-guide/debugging.html
index 9145151d..e113c3a3 100644
--- a/contributor-guide/debugging.html
+++ b/contributor-guide/debugging.html
@@ -163,6 +163,11 @@ under the License.
Tuning Guide
</a>
</li>
+ <li class="toctree-l1">
+ <a class="reference internal" href="../user-guide/metrics.html">
+ Metrics Guide
+ </a>
+ </li>
</ul>
<p aria-level="2" class="caption" role="heading">
<span class="caption-text">
diff --git a/contributor-guide/development.html
b/contributor-guide/development.html
index 6f1ed9a9..53cc8c6b 100644
--- a/contributor-guide/development.html
+++ b/contributor-guide/development.html
@@ -163,6 +163,11 @@ under the License.
Tuning Guide
</a>
</li>
+ <li class="toctree-l1">
+ <a class="reference internal" href="../user-guide/metrics.html">
+ Metrics Guide
+ </a>
+ </li>
</ul>
<p aria-level="2" class="caption" role="heading">
<span class="caption-text">
diff --git a/contributor-guide/plugin_overview.html
b/contributor-guide/plugin_overview.html
index e30df4f4..b305f31d 100644
--- a/contributor-guide/plugin_overview.html
+++ b/contributor-guide/plugin_overview.html
@@ -163,6 +163,11 @@ under the License.
Tuning Guide
</a>
</li>
+ <li class="toctree-l1">
+ <a class="reference internal" href="../user-guide/metrics.html">
+ Metrics Guide
+ </a>
+ </li>
</ul>
<p aria-level="2" class="caption" role="heading">
<span class="caption-text">
diff --git a/contributor-guide/profiling_native_code.html
b/contributor-guide/profiling_native_code.html
index 75ad7838..d7447b4a 100644
--- a/contributor-guide/profiling_native_code.html
+++ b/contributor-guide/profiling_native_code.html
@@ -163,6 +163,11 @@ under the License.
Tuning Guide
</a>
</li>
+ <li class="toctree-l1">
+ <a class="reference internal" href="../user-guide/metrics.html">
+ Metrics Guide
+ </a>
+ </li>
</ul>
<p aria-level="2" class="caption" role="heading">
<span class="caption-text">
diff --git a/contributor-guide/spark-sql-tests.html
b/contributor-guide/spark-sql-tests.html
index 76083a50..eb889e23 100644
--- a/contributor-guide/spark-sql-tests.html
+++ b/contributor-guide/spark-sql-tests.html
@@ -162,6 +162,11 @@ under the License.
Tuning Guide
</a>
</li>
+ <li class="toctree-l1">
+ <a class="reference internal" href="../user-guide/metrics.html">
+ Metrics Guide
+ </a>
+ </li>
</ul>
<p aria-level="2" class="caption" role="heading">
<span class="caption-text">
diff --git a/genindex.html b/genindex.html
index ed430752..f82e6964 100644
--- a/genindex.html
+++ b/genindex.html
@@ -160,6 +160,11 @@ under the License.
Tuning Guide
</a>
</li>
+ <li class="toctree-l1">
+ <a class="reference internal" href="user-guide/metrics.html">
+ Metrics Guide
+ </a>
+ </li>
</ul>
<p aria-level="2" class="caption" role="heading">
<span class="caption-text">
diff --git a/index.html b/index.html
index 4cc411d8..a99fb360 100644
--- a/index.html
+++ b/index.html
@@ -162,6 +162,11 @@ under the License.
Tuning Guide
</a>
</li>
+ <li class="toctree-l1">
+ <a class="reference internal" href="user-guide/metrics.html">
+ Metrics Guide
+ </a>
+ </li>
</ul>
<p aria-level="2" class="caption" role="heading">
<span class="caption-text">
@@ -327,6 +332,7 @@ as a native runtime to achieve improvement in terms of
query efficiency and quer
<li class="toctree-l1"><a class="reference internal"
href="user-guide/configs.html">Configuration Settings</a></li>
<li class="toctree-l1"><a class="reference internal"
href="user-guide/compatibility.html">Compatibility Guide</a></li>
<li class="toctree-l1"><a class="reference internal"
href="user-guide/tuning.html">Tuning Guide</a></li>
+<li class="toctree-l1"><a class="reference internal"
href="user-guide/metrics.html">Metrics Guide</a></li>
</ul>
</div>
<div class="toctree-wrapper compound" id="toc-contributor-guide-links">
diff --git a/objects.inv b/objects.inv
index 49c0080e..c34abdf6 100644
Binary files a/objects.inv and b/objects.inv differ
diff --git a/search.html b/search.html
index c6d0b897..bbec4adc 100644
--- a/search.html
+++ b/search.html
@@ -167,6 +167,11 @@ under the License.
Tuning Guide
</a>
</li>
+ <li class="toctree-l1">
+ <a class="reference internal" href="user-guide/metrics.html">
+ Metrics Guide
+ </a>
+ </li>
</ul>
<p aria-level="2" class="caption" role="heading">
<span class="caption-text">
diff --git a/searchindex.js b/searchindex.js
index d7f82382..cd9a2048 100644
--- a/searchindex.js
+++ b/searchindex.js
@@ -1 +1 @@
-Search.setIndex({"alltitles": {"1. Install Comet": [[9, "install-comet"]], "2.
Clone Spark and Apply Diff": [[9, "clone-spark-and-apply-diff"]], "3. Run Spark
SQL Tests": [[9, "run-spark-sql-tests"]], "ANSI mode": [[11, "ansi-mode"]],
"API Differences Between Spark Versions": [[0,
"api-differences-between-spark-versions"]], "ASF Links": [[10, null]], "Adding
Spark-side Tests for the New Expression": [[0,
"adding-spark-side-tests-for-the-new-expression"]], "Adding a New Expression":
[[0, [...]
\ No newline at end of file
+Search.setIndex({"alltitles": {"1. Install Comet": [[9, "install-comet"]], "2.
Clone Spark and Apply Diff": [[9, "clone-spark-and-apply-diff"]], "3. Run Spark
SQL Tests": [[9, "run-spark-sql-tests"]], "ANSI mode": [[11, "ansi-mode"]],
"API Differences Between Spark Versions": [[0,
"api-differences-between-spark-versions"]], "ASF Links": [[10, null]], "Adding
Spark-side Tests for the New Expression": [[0,
"adding-spark-side-tests-for-the-new-expression"]], "Adding a New Expression":
[[0, [...]
\ No newline at end of file
diff --git a/user-guide/compatibility.html b/user-guide/compatibility.html
index 4308e239..d8020b57 100644
--- a/user-guide/compatibility.html
+++ b/user-guide/compatibility.html
@@ -163,6 +163,11 @@ under the License.
Tuning Guide
</a>
</li>
+ <li class="toctree-l1">
+ <a class="reference internal" href="metrics.html">
+ Metrics Guide
+ </a>
+ </li>
</ul>
<p aria-level="2" class="caption" role="heading">
<span class="caption-text">
diff --git a/user-guide/configs.html b/user-guide/configs.html
index 6ad8cb8c..f4235bb1 100644
--- a/user-guide/configs.html
+++ b/user-guide/configs.html
@@ -163,6 +163,11 @@ under the License.
Tuning Guide
</a>
</li>
+ <li class="toctree-l1">
+ <a class="reference internal" href="metrics.html">
+ Metrics Guide
+ </a>
+ </li>
</ul>
<p aria-level="2" class="caption" role="heading">
<span class="caption-text">
diff --git a/user-guide/datasources.html b/user-guide/datasources.html
index ded9c996..86bf2417 100644
--- a/user-guide/datasources.html
+++ b/user-guide/datasources.html
@@ -163,6 +163,11 @@ under the License.
Tuning Guide
</a>
</li>
+ <li class="toctree-l1">
+ <a class="reference internal" href="metrics.html">
+ Metrics Guide
+ </a>
+ </li>
</ul>
<p aria-level="2" class="caption" role="heading">
<span class="caption-text">
diff --git a/user-guide/datatypes.html b/user-guide/datatypes.html
index 11b0746f..00a8a575 100644
--- a/user-guide/datatypes.html
+++ b/user-guide/datatypes.html
@@ -163,6 +163,11 @@ under the License.
Tuning Guide
</a>
</li>
+ <li class="toctree-l1">
+ <a class="reference internal" href="metrics.html">
+ Metrics Guide
+ </a>
+ </li>
</ul>
<p aria-level="2" class="caption" role="heading">
<span class="caption-text">
diff --git a/user-guide/expressions.html b/user-guide/expressions.html
index 0a7409bd..2d8b41de 100644
--- a/user-guide/expressions.html
+++ b/user-guide/expressions.html
@@ -163,6 +163,11 @@ under the License.
Tuning Guide
</a>
</li>
+ <li class="toctree-l1">
+ <a class="reference internal" href="metrics.html">
+ Metrics Guide
+ </a>
+ </li>
</ul>
<p aria-level="2" class="caption" role="heading">
<span class="caption-text">
diff --git a/user-guide/installation.html b/user-guide/installation.html
index 5f01bf26..cb729cae 100644
--- a/user-guide/installation.html
+++ b/user-guide/installation.html
@@ -163,6 +163,11 @@ under the License.
Tuning Guide
</a>
</li>
+ <li class="toctree-l1">
+ <a class="reference internal" href="metrics.html">
+ Metrics Guide
+ </a>
+ </li>
</ul>
<p aria-level="2" class="caption" role="heading">
<span class="caption-text">
diff --git a/user-guide/kubernetes.html b/user-guide/kubernetes.html
index 511e3111..1472d2a5 100644
--- a/user-guide/kubernetes.html
+++ b/user-guide/kubernetes.html
@@ -163,6 +163,11 @@ under the License.
Tuning Guide
</a>
</li>
+ <li class="toctree-l1">
+ <a class="reference internal" href="metrics.html">
+ Metrics Guide
+ </a>
+ </li>
</ul>
<p aria-level="2" class="caption" role="heading">
<span class="caption-text">
diff --git a/user-guide/overview.html b/user-guide/metrics.html
similarity index 58%
copy from user-guide/overview.html
copy to user-guide/metrics.html
index ed7e7b82..88c969f1 100644
--- a/user-guide/overview.html
+++ b/user-guide/metrics.html
@@ -24,7 +24,7 @@ under the License.
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0"
/><meta name="viewport" content="width=device-width, initial-scale=1" />
- <title>Comet Overview — Apache DataFusion Comet
documentation</title>
+ <title>Comet Metrics — Apache DataFusion Comet documentation</title>
<link href="../_static/styles/theme.css?digest=1999514e3f237ded88cf"
rel="stylesheet">
<link
href="../_static/styles/pydata-sphinx-theme.css?digest=1999514e3f237ded88cf"
rel="stylesheet">
@@ -53,8 +53,8 @@ under the License.
<script async="true" defer="true"
src="https://buttons.github.io/buttons.js"></script>
<link rel="index" title="Index" href="../genindex.html" />
<link rel="search" title="Search" href="../search.html" />
- <link rel="next" title="Installing DataFusion Comet"
href="installation.html" />
- <link rel="prev" title="Apache DataFusion Comet" href="../index.html" />
+ <link rel="next" title="Contributing to Apache DataFusion Comet"
href="../contributor-guide/contributing.html" />
+ <link rel="prev" title="Tuning Guide" href="tuning.html" />
<meta name="viewport" content="width=device-width, initial-scale=1" />
<meta name="docsearch:language" content="en">
@@ -108,8 +108,8 @@ under the License.
</span>
</p>
<ul class="current nav bd-sidenav">
- <li class="toctree-l1 current active">
- <a class="current reference internal" href="#">
+ <li class="toctree-l1">
+ <a class="reference internal" href="overview.html">
Comet Overview
</a>
</li>
@@ -163,6 +163,11 @@ under the License.
Tuning Guide
</a>
</li>
+ <li class="toctree-l1 current active">
+ <a class="current reference internal" href="#">
+ Metrics Guide
+ </a>
+ </li>
</ul>
<p aria-level="2" class="caption" role="heading">
<span class="caption-text">
@@ -282,19 +287,38 @@ under the License.
<nav id="bd-toc-nav">
<ul class="visible nav section-nav flex-column">
<li class="toc-h2 nav-item toc-entry">
- <a class="reference internal nav-link" href="#architecture">
- Architecture
- </a>
- </li>
- <li class="toc-h2 nav-item toc-entry">
- <a class="reference internal nav-link"
href="#feature-parity-with-apache-spark">
- Feature Parity with Apache Spark
+ <a class="reference internal nav-link" href="#spark-sql-metrics">
+ Spark SQL Metrics
</a>
+ <ul class="nav section-nav flex-column">
+ <li class="toc-h3 nav-item toc-entry">
+ <a class="reference internal nav-link" href="#cometscanexec">
+ CometScanExec
+ </a>
+ </li>
+ <li class="toc-h3 nav-item toc-entry">
+ <a class="reference internal nav-link" href="#exchange">
+ Exchange
+ </a>
+ </li>
+ </ul>
</li>
<li class="toc-h2 nav-item toc-entry">
- <a class="reference internal nav-link" href="#getting-started">
- Getting Started
+ <a class="reference internal nav-link" href="#native-metrics">
+ Native Metrics
</a>
+ <ul class="nav section-nav flex-column">
+ <li class="toc-h3 nav-item toc-entry">
+ <a class="reference internal nav-link" href="#scanexec">
+ ScanExec
+ </a>
+ </li>
+ <li class="toc-h3 nav-item toc-entry">
+ <a class="reference internal nav-link" href="#shufflewriterexec">
+ ShuffleWriterExec
+ </a>
+ </li>
+ </ul>
</li>
</ul>
@@ -305,7 +329,7 @@ under the License.
<div class="tocsection editthispage">
- <a
href="https://github.com/apache/datafusion-comet/edit/main/docs/source/user-guide/overview.md">
+ <a
href="https://github.com/apache/datafusion-comet/edit/main/docs/source/user-guide/metrics.md">
<i class="fas fa-pencil-alt"></i> Edit this page
</a>
</div>
@@ -325,58 +349,122 @@ under the License.
<div>
<!---
- Licensed to the Apache Software Foundation (ASF) under one
- or more contributor license agreements. See the NOTICE file
- distributed with this work for additional information
- regarding copyright ownership. The ASF licenses this file
- to you under the Apache License, Version 2.0 (the
- "License"); you may not use this file except in compliance
- with the License. You may obtain a copy of the License at
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements. See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership. The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License. You may obtain a copy of the License at
- http://www.apache.org/licenses/LICENSE-2.0
+http://www.apache.org/licenses/LICENSE-2.0
- Unless required by applicable law or agreed to in writing,
- software distributed under the License is distributed on an
- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
- KIND, either express or implied. See the License for the
- specific language governing permissions and limitations
- under the License.
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied. See the License for the
+specific language governing permissions and limitations
+under the License.
-->
-<section id="comet-overview">
-<h1>Comet Overview<a class="headerlink" href="#comet-overview" title="Link to
this heading">¶</a></h1>
-<p>Apache DataFusion Comet is a high-performance accelerator for Apache Spark,
built on top of the powerful
-<a class="reference external" href="https://datafusion.apache.org">Apache
DataFusion</a> query engine. Comet is designed to significantly enhance the
-performance of Apache Spark workloads while leveraging commodity hardware and
seamlessly integrating with the
-Spark ecosystem without requiring any code changes.</p>
-<p>The following diagram provides an overview of Comet’s architecture.</p>
-<p><img alt="Comet Overview" src="../_images/comet-overview.png" /></p>
-<p>Comet aims to support:</p>
-<ul class="simple">
-<li><p>a native Parquet implementation, including both reader and
writer</p></li>
-<li><p>full implementation of Spark operators, including
-Filter/Project/Aggregation/Join/Exchange etc.</p></li>
-<li><p>full implementation of Spark built-in expressions.</p></li>
-<li><p>a UDF framework for users to migrate their existing UDF to
native</p></li>
-</ul>
-<section id="architecture">
-<h2>Architecture<a class="headerlink" href="#architecture" title="Link to this
heading">¶</a></h2>
-<p>The following diagram shows how Comet integrates with Apache Spark.</p>
-<p><img alt="Comet System Diagram" src="../_images/comet-system-diagram.png"
/></p>
+<section id="comet-metrics">
+<h1>Comet Metrics<a class="headerlink" href="#comet-metrics" title="Link to
this heading">¶</a></h1>
+<section id="spark-sql-metrics">
+<h2>Spark SQL Metrics<a class="headerlink" href="#spark-sql-metrics"
title="Link to this heading">¶</a></h2>
+<p>Set <code class="docutils literal notranslate"><span
class="pre">spark.comet.metrics.detailed=true</span></code> to see all
available Comet metrics.</p>
+<section id="cometscanexec">
+<h3>CometScanExec<a class="headerlink" href="#cometscanexec" title="Link to
this heading">¶</a></h3>
+<table class="table">
+<thead>
+<tr class="row-odd"><th class="head"><p>Metric</p></th>
+<th class="head"><p>Description</p></th>
+</tr>
+</thead>
+<tbody>
+<tr class="row-even"><td><p><code class="docutils literal notranslate"><span
class="pre">scan</span> <span class="pre">time</span></code></p></td>
+<td><p>Total time to scan a Parquet file. This is not comparable to the same
metric in Spark because Comet’s scan metric is more accurate. Although both
Comet and Spark measure the time in nanoseconds, Spark rounds this time to the
nearest millisecond per batch and Comet does not.</p></td>
+</tr>
+</tbody>
+</table>
+</section>
+<section id="exchange">
+<h3>Exchange<a class="headerlink" href="#exchange" title="Link to this
heading">¶</a></h3>
+<p>Comet adds some additional metrics:</p>
+<table class="table">
+<thead>
+<tr class="row-odd"><th class="head"><p>Metric</p></th>
+<th class="head"><p>Description</p></th>
+</tr>
+</thead>
+<tbody>
+<tr class="row-even"><td><p><code class="docutils literal notranslate"><span
class="pre">native</span> <span class="pre">shuffle</span> <span
class="pre">time</span></code></p></td>
+<td><p>Total time in native code excluding any child operators.</p></td>
+</tr>
+<tr class="row-odd"><td><p><code class="docutils literal notranslate"><span
class="pre">repartition</span> <span class="pre">time</span></code></p></td>
+<td><p>Time to repartition batches.</p></td>
+</tr>
+<tr class="row-even"><td><p><code class="docutils literal notranslate"><span
class="pre">memory</span> <span class="pre">pool</span> <span
class="pre">time</span></code></p></td>
+<td><p>Time interacting with memory pool.</p></td>
+</tr>
+<tr class="row-odd"><td><p><code class="docutils literal notranslate"><span
class="pre">encoding</span> <span class="pre">and</span> <span
class="pre">compression</span> <span class="pre">time</span></code></p></td>
+<td><p>Time to encode batches in IPC format and compress using ZSTD.</p></td>
+</tr>
+</tbody>
+</table>
+</section>
+</section>
+<section id="native-metrics">
+<h2>Native Metrics<a class="headerlink" href="#native-metrics" title="Link to
this heading">¶</a></h2>
+<p>Setting <code class="docutils literal notranslate"><span
class="pre">spark.comet.explain.native.enabled=true</span></code> will cause
native plans to be logged in each executor. Metrics are
+logged for each native plan (and there is one plan per task, so this is very
verbose).</p>
+<p>Here is a guide to some of the native metrics.</p>
+<section id="scanexec">
+<h3>ScanExec<a class="headerlink" href="#scanexec" title="Link to this
heading">¶</a></h3>
+<table class="table">
+<thead>
+<tr class="row-odd"><th class="head"><p>Metric</p></th>
+<th class="head"><p>Description</p></th>
+</tr>
+</thead>
+<tbody>
+<tr class="row-even"><td><p><code class="docutils literal notranslate"><span
class="pre">elapsed_compute</span></code></p></td>
+<td><p>Total time spent in this operator, fetching batches from a JVM
iterator.</p></td>
+</tr>
+<tr class="row-odd"><td><p><code class="docutils literal notranslate"><span
class="pre">jvm_fetch_time</span></code></p></td>
+<td><p>Time spent in the JVM fetching input batches to be read by this <code
class="docutils literal notranslate"><span class="pre">ScanExec</span></code>
instance.</p></td>
+</tr>
+<tr class="row-even"><td><p><code class="docutils literal notranslate"><span
class="pre">arrow_ffi_time</span></code></p></td>
+<td><p>Time spent using Arrow FFI to create Arrow batches from the memory
addresses returned from the JVM.</p></td>
+</tr>
+</tbody>
+</table>
</section>
-<section id="feature-parity-with-apache-spark">
-<h2>Feature Parity with Apache Spark<a class="headerlink"
href="#feature-parity-with-apache-spark" title="Link to this heading">¶</a></h2>
-<p>The project strives to keep feature parity with Apache Spark, that is,
-users should expect the same behavior (w.r.t features, configurations,
-query results, etc) with Comet turned on or turned off in their Spark
-jobs. In addition, Comet extension should automatically detect unsupported
-features and fallback to Spark engine.</p>
-<p>To achieve this, besides unit tests within Comet itself, we also re-use
-Spark SQL tests and make sure they all pass with Comet extension
-enabled.</p>
+<section id="shufflewriterexec">
+<h3>ShuffleWriterExec<a class="headerlink" href="#shufflewriterexec"
title="Link to this heading">¶</a></h3>
+<table class="table">
+<thead>
+<tr class="row-odd"><th class="head"><p>Metric</p></th>
+<th class="head"><p>Description</p></th>
+</tr>
+</thead>
+<tbody>
+<tr class="row-even"><td><p><code class="docutils literal notranslate"><span
class="pre">elapsed_compute</span></code></p></td>
+<td><p>Total time excluding any child operators.</p></td>
+</tr>
+<tr class="row-odd"><td><p><code class="docutils literal notranslate"><span
class="pre">repart_time</span></code></p></td>
+<td><p>Time to repartition batches.</p></td>
+</tr>
+<tr class="row-even"><td><p><code class="docutils literal notranslate"><span
class="pre">ipc_time</span></code></p></td>
+<td><p>Time to encode batches in IPC format and compress using ZSTD.</p></td>
+</tr>
+<tr class="row-odd"><td><p><code class="docutils literal notranslate"><span
class="pre">mempool_time</span></code></p></td>
+<td><p>Time interacting with memory pool.</p></td>
+</tr>
+<tr class="row-even"><td><p><code class="docutils literal notranslate"><span
class="pre">write_time</span></code></p></td>
+<td><p>Time spent writing bytes to disk.</p></td>
+</tr>
+</tbody>
+</table>
</section>
-<section id="getting-started">
-<h2>Getting Started<a class="headerlink" href="#getting-started" title="Link
to this heading">¶</a></h2>
-<p>Refer to the <a class="reference internal" href="installation.html"><span
class="std std-doc">Comet Installation Guide</span></a> to get started.</p>
</section>
</section>
@@ -386,17 +474,17 @@ enabled.</p>
<!-- Previous / next buttons -->
<div class='prev-next-area'>
- <a class='left-prev' id="prev-link" href="../index.html" title="previous
page">
+ <a class='left-prev' id="prev-link" href="tuning.html" title="previous
page">
<i class="fas fa-angle-left"></i>
<div class="prev-next-info">
<p class="prev-next-subtitle">previous</p>
- <p class="prev-next-title">Apache DataFusion Comet</p>
+ <p class="prev-next-title">Tuning Guide</p>
</div>
</a>
- <a class='right-next' id="next-link" href="installation.html" title="next
page">
+ <a class='right-next' id="next-link"
href="../contributor-guide/contributing.html" title="next page">
<div class="prev-next-info">
<p class="prev-next-subtitle">next</p>
- <p class="prev-next-title">Installing DataFusion Comet</p>
+ <p class="prev-next-title">Contributing to Apache DataFusion Comet</p>
</div>
<i class="fas fa-angle-right"></i>
</a>
diff --git a/user-guide/operators.html b/user-guide/operators.html
index d7420218..74782397 100644
--- a/user-guide/operators.html
+++ b/user-guide/operators.html
@@ -163,6 +163,11 @@ under the License.
Tuning Guide
</a>
</li>
+ <li class="toctree-l1">
+ <a class="reference internal" href="metrics.html">
+ Metrics Guide
+ </a>
+ </li>
</ul>
<p aria-level="2" class="caption" role="heading">
<span class="caption-text">
diff --git a/user-guide/overview.html b/user-guide/overview.html
index ed7e7b82..70c68309 100644
--- a/user-guide/overview.html
+++ b/user-guide/overview.html
@@ -163,6 +163,11 @@ under the License.
Tuning Guide
</a>
</li>
+ <li class="toctree-l1">
+ <a class="reference internal" href="metrics.html">
+ Metrics Guide
+ </a>
+ </li>
</ul>
<p aria-level="2" class="caption" role="heading">
<span class="caption-text">
diff --git a/user-guide/source.html b/user-guide/source.html
index a53f37fe..5a121d88 100644
--- a/user-guide/source.html
+++ b/user-guide/source.html
@@ -163,6 +163,11 @@ under the License.
Tuning Guide
</a>
</li>
+ <li class="toctree-l1">
+ <a class="reference internal" href="metrics.html">
+ Metrics Guide
+ </a>
+ </li>
</ul>
<p aria-level="2" class="caption" role="heading">
<span class="caption-text">
diff --git a/user-guide/tuning.html b/user-guide/tuning.html
index f6c1019b..30fecd2c 100644
--- a/user-guide/tuning.html
+++ b/user-guide/tuning.html
@@ -53,7 +53,7 @@ under the License.
<script async="true" defer="true"
src="https://buttons.github.io/buttons.js"></script>
<link rel="index" title="Index" href="../genindex.html" />
<link rel="search" title="Search" href="../search.html" />
- <link rel="next" title="Contributing to Apache DataFusion Comet"
href="../contributor-guide/contributing.html" />
+ <link rel="next" title="Comet Metrics" href="metrics.html" />
<link rel="prev" title="Compatibility Guide" href="compatibility.html" />
<meta name="viewport" content="width=device-width, initial-scale=1" />
<meta name="docsearch:language" content="en">
@@ -163,6 +163,11 @@ under the License.
Tuning Guide
</a>
</li>
+ <li class="toctree-l1">
+ <a class="reference internal" href="metrics.html">
+ Metrics Guide
+ </a>
+ </li>
</ul>
<p aria-level="2" class="caption" role="heading">
<span class="caption-text">
@@ -332,28 +337,6 @@ under the License.
</li>
</ul>
</li>
- <li class="toc-h2 nav-item toc-entry">
- <a class="reference internal nav-link" href="#metrics">
- Metrics
- </a>
- <ul class="nav section-nav flex-column">
- <li class="toc-h3 nav-item toc-entry">
- <a class="reference internal nav-link" href="#spark-sql-metrics">
- Spark SQL Metrics
- </a>
- </li>
- <li class="toc-h3 nav-item toc-entry">
- <a class="reference internal nav-link" href="#native-metrics">
- Native Metrics
- </a>
- </li>
- <li class="toc-h3 nav-item toc-entry">
- <a class="reference internal nav-link" href="#scanexec">
- ScanExec
- </a>
- </li>
- </ul>
- </li>
<li class="toc-h2 nav-item toc-entry">
<a class="reference internal nav-link" href="#explain-plan">
Explain Plan
@@ -485,45 +468,6 @@ then any shuffle operations that cannot be supported in
this mode will fall back
</section>
</section>
</section>
-<section id="metrics">
-<h2>Metrics<a class="headerlink" href="#metrics" title="Link to this
heading">¶</a></h2>
-<section id="spark-sql-metrics">
-<h3>Spark SQL Metrics<a class="headerlink" href="#spark-sql-metrics"
title="Link to this heading">¶</a></h3>
-<p>Some Comet metrics are not directly comparable to Spark metrics in some
cases:</p>
-<ul class="simple">
-<li><p><code class="docutils literal notranslate"><span
class="pre">CometScanExec</span></code> uses nanoseconds for total scan time.
Spark also measures scan time in nanoseconds but converts to
-milliseconds <em>per batch</em> which can result in a large loss of precision,
making it difficult to compare scan times
-between Spark and Comet.</p></li>
-</ul>
-</section>
-<section id="native-metrics">
-<h3>Native Metrics<a class="headerlink" href="#native-metrics" title="Link to
this heading">¶</a></h3>
-<p>Setting <code class="docutils literal notranslate"><span
class="pre">spark.comet.explain.native.enabled=true</span></code> will cause
native plans to be logged in each executor. Metrics are
-logged for each native plan (and there is one plan per task, so this is very
verbose).</p>
-<p>Here is a guide to some of the native metrics.</p>
-</section>
-<section id="scanexec">
-<h3>ScanExec<a class="headerlink" href="#scanexec" title="Link to this
heading">¶</a></h3>
-<table class="table">
-<thead>
-<tr class="row-odd"><th class="head"><p>Metric</p></th>
-<th class="head"><p>Description</p></th>
-</tr>
-</thead>
-<tbody>
-<tr class="row-even"><td><p><code class="docutils literal notranslate"><span
class="pre">elapsed_compute</span></code></p></td>
-<td><p>Total time spent in this operator, fetching batches from a JVM
iterator.</p></td>
-</tr>
-<tr class="row-odd"><td><p><code class="docutils literal notranslate"><span
class="pre">jvm_fetch_time</span></code></p></td>
-<td><p>Time spent in the JVM fetching input batches to be read by this <code
class="docutils literal notranslate"><span class="pre">ScanExec</span></code>
instance.</p></td>
-</tr>
-<tr class="row-even"><td><p><code class="docutils literal notranslate"><span
class="pre">arrow_ffi_time</span></code></p></td>
-<td><p>Time spent using Arrow FFI to create Arrow batches from the memory
addresses returned from the JVM.</p></td>
-</tr>
-</tbody>
-</table>
-</section>
-</section>
<section id="explain-plan">
<h2>Explain Plan<a class="headerlink" href="#explain-plan" title="Link to this
heading">¶</a></h2>
<section id="extended-explain">
@@ -552,10 +496,10 @@ To enable this, in the Spark configuration, set the
following:</p>
<p class="prev-next-title">Compatibility Guide</p>
</div>
</a>
- <a class='right-next' id="next-link"
href="../contributor-guide/contributing.html" title="next page">
+ <a class='right-next' id="next-link" href="metrics.html" title="next page">
<div class="prev-next-info">
<p class="prev-next-subtitle">next</p>
- <p class="prev-next-title">Contributing to Apache DataFusion Comet</p>
+ <p class="prev-next-title">Comet Metrics</p>
</div>
<i class="fas fa-angle-right"></i>
</a>
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]