This is an automated email from the ASF dual-hosted git repository.
agrove pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/datafusion-comet.git
The following commit(s) were added to refs/heads/main by this push:
new 414e7a36 docs: Add benchmarking guide (#444)
414e7a36 is described below
commit 414e7a36a7aa8340c0ebf85a749e4306c8376a19
Author: Andy Grove <[email protected]>
AuthorDate: Fri May 17 16:24:48 2024 -0600
docs: Add benchmarking guide (#444)
* add benchmarking guide
* add ASF header
---
docs/source/contributor-guide/benchmarking.md | 62 +++++++++++++++++++++++++++
docs/source/index.rst | 1 +
2 files changed, 63 insertions(+)
diff --git a/docs/source/contributor-guide/benchmarking.md
b/docs/source/contributor-guide/benchmarking.md
new file mode 100644
index 00000000..502b35c2
--- /dev/null
+++ b/docs/source/contributor-guide/benchmarking.md
@@ -0,0 +1,62 @@
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements. See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership. The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied. See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# Comet Benchmarking Guide
+
+To track progress on performance, we regularly run benchmarks derived from
TPC-H and TPC-DS. Benchmarking scripts are
+available in the [DataFusion
Benchmarks](https://github.com/apache/datafusion-benchmarks) GitHub repository.
+
+Here is an example command for running the benchmarks. This command will need
to be adapted based on the Spark
+environment and location of data files.
+
+This command assumes that `datafusion-benchmarks` is checked out in a parallel
directory to `datafusion-comet`.
+
+```shell
+$SPARK_HOME/bin/spark-submit \
+ --master "local[*]" \
+ --conf spark.driver.memory=8G \
+ --conf spark.executor.memory=64G \
+ --conf spark.executor.cores=16 \
+ --conf spark.cores.max=16 \
+ --conf spark.eventLog.enabled=true \
+ --conf spark.sql.autoBroadcastJoinThreshold=-1 \
+ --jars $COMET_JAR \
+ --conf spark.driver.extraClassPath=$COMET_JAR \
+ --conf spark.executor.extraClassPath=$COMET_JAR \
+ --conf spark.sql.extensions=org.apache.comet.CometSparkSessionExtensions \
+ --conf spark.comet.enabled=true \
+ --conf spark.comet.exec.enabled=true \
+ --conf spark.comet.exec.all.enabled=true \
+ --conf spark.comet.cast.allowIncompatible=true \
+ --conf spark.comet.explainFallback.enabled=true \
+ --conf spark.comet.parquet.io.enabled=false \
+ --conf spark.comet.batchSize=8192 \
+ --conf spark.comet.columnar.shuffle.enabled=false \
+ --conf spark.comet.exec.shuffle.enabled=true \
+ --conf
spark.shuffle.manager=org.apache.spark.sql.comet.execution.shuffle.CometShuffleManager
\
+ --conf spark.sql.adaptive.coalescePartitions.enabled=false \
+ --conf spark.comet.shuffle.enforceMode.enabled=true \
+ ../datafusion-benchmarks/runners/datafusion-comet/tpcbench.py \
+ --benchmark tpch \
+ --data /mnt/bigdata/tpch/sf100-parquet/ \
+ --queries ../datafusion-benchmarks/tpch/queries
+```
+
+Comet performance can be compared to regular Spark performance by running the
benchmark twice, once with
+`spark.comet.enabled` set to `true` and once with it set to `false`.
\ No newline at end of file
diff --git a/docs/source/index.rst b/docs/source/index.rst
index eb42950b..819f7201 100644
--- a/docs/source/index.rst
+++ b/docs/source/index.rst
@@ -58,6 +58,7 @@ as a native runtime to achieve improvement in terms of query
efficiency and quer
Comet Plugin Overview <contributor-guide/plugin_overview>
Development Guide <contributor-guide/development>
Debugging Guide <contributor-guide/debugging>
+ Benchmarking Guide <contributor-guide/benchmarking>
Profiling Native Code <contributor-guide/profiling_native_code>
Github and Issue Tracker <https://github.com/apache/datafusion-comet>
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]