[iceberg] branch master updated: Docs: Describe available Benchmarks and how to run them (#2767)

russellspitzer Fri, 02 Jul 2021 05:04:44 -0700

This is an automated email from the ASF dual-hosted git repository.

russellspitzer pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/iceberg.git



The following commit(s) were added to refs/heads/master by this push:
     new cf0b94f  Docs: Describe available Benchmarks and how to run them 
(#2767)
cf0b94f is described below

commit cf0b94f69ec2659d59fcb4aa1be5865d2db03ffd
Author: Eduard Tudenhöfner <[email protected]>
AuthorDate: Fri Jul 2 14:04:29 2021 +0200

    Docs: Describe available Benchmarks and how to run them (#2767)
---
 site/docs/benchmarks.md | 114 ++++++++++++++++++++++++++++++++++++++++++++++++
 site/docs/community.md  |   6 +++
 2 files changed, 120 insertions(+)

diff --git a/site/docs/benchmarks.md b/site/docs/benchmarks.md
new file mode 100644
index 0000000..40ee972
--- /dev/null
+++ b/site/docs/benchmarks.md
@@ -0,0 +1,114 @@
+<!--
+  - Licensed to the Apache Software Foundation (ASF) under one
+  - or more contributor license agreements.  See the NOTICE file
+  - distributed with this work for additional information
+  - regarding copyright ownership.  The ASF licenses this file
+  - to you under the Apache License, Version 2.0 (the
+  - "License"); you may not use this file except in compliance
+  - with the License.  You may obtain a copy of the License at
+  -
+  -   http://www.apache.org/licenses/LICENSE-2.0
+  -
+  - Unless required by applicable law or agreed to in writing,
+  - software distributed under the License is distributed on an
+  - "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  - KIND, either express or implied.  See the License for the
+  - specific language governing permissions and limitations
+  - under the License.
+  -->
+
+## Available Benchmarks and how to run them
+
+Benchmarks are located under `<project-name>/jmh`. It is generally favorable 
to only run the tests of interest rather than running all available benchmarks.
+Also note that JMH benchmarks run within the same JVM as the 
system-under-test, so results might vary between runs.
+
+
+### IcebergSourceNestedListParquetDataWriteBenchmark
+A benchmark that evaluates the performance of writing nested Parquet data 
using Iceberg and the built-in file source in Spark. To run this benchmark for 
either spark-2 or spark-3:
+
+`./gradlew :iceberg-spark[2|3]:jmh 
-PjmhIncludeRegex=IcebergSourceNestedListParquetDataWriteBenchmark 
-PjmhOutputPath=benchmark/iceberg-source-nested-list-parquet-data-write-benchmark-result.txt`
+
+### SparkParquetReadersNestedDataBenchmark
+A benchmark that evaluates the performance of reading nested Parquet data 
using Iceberg and Spark Parquet readers. To run this benchmark for either 
spark-2 or spark-3:
+
+`./gradlew :iceberg-spark[2|3]:jmh 
-PjmhIncludeRegex=SparkParquetReadersNestedDataBenchmark 
-PjmhOutputPath=benchmark/spark-parquet-readers-nested-data-benchmark-result.txt`
+
+### SparkParquetWritersFlatDataBenchmark
+A benchmark that evaluates the performance of writing Parquet data with a flat 
schema using Iceberg and Spark Parquet writers. To run this benchmark for 
either spark-2 or spark-3:
+
+`./gradlew :iceberg-spark[2|3]:jmh 
-PjmhIncludeRegex=SparkParquetWritersFlatDataBenchmark 
-PjmhOutputPath=benchmark/spark-parquet-writers-flat-data-benchmark-result.txt`
+
+### IcebergSourceFlatORCDataReadBenchmark
+A benchmark that evaluates the performance of reading ORC data with a flat 
schema using Iceberg and the built-in file source in Spark. To run this 
benchmark for either spark-2 or spark-3:
+
+`./gradlew :iceberg-spark[2|3]:jmh 
-PjmhIncludeRegex=IcebergSourceFlatORCDataReadBenchmark 
-PjmhOutputPath=benchmark/iceberg-source-flat-orc-data-read-benchmark-result.txt`
+
+### SparkParquetReadersFlatDataBenchmark
+A benchmark that evaluates the performance of reading Parquet data with a flat 
schema using Iceberg and Spark Parquet readers. To run this benchmark for 
either spark-2 or spark-3:
+
+`./gradlew :iceberg-spark[2|3]:jmh 
-PjmhIncludeRegex=SparkParquetReadersFlatDataBenchmark 
-PjmhOutputPath=benchmark/spark-parquet-readers-flat-data-benchmark-result.txt`
+
+### VectorizedReadDictionaryEncodedFlatParquetDataBenchmark
+A benchmark to compare performance of reading Parquet dictionary encoded data 
with a flat schema using vectorized Iceberg read path and the built-in file 
source in Spark. To run this benchmark for either spark-2 or spark-3:
+
+`./gradlew :iceberg-spark[2|3]:jmh 
-PjmhIncludeRegex=VectorizedReadDictionaryEncodedFlatParquetDataBenchmark 
-PjmhOutputPath=benchmark/vectorized-read-dict-encoded-flat-parquet-data-result.txt`
+
+### IcebergSourceNestedListORCDataWriteBenchmark
+A benchmark that evaluates the performance of writing nested Parquet data 
using Iceberg and the built-in file source in Spark. To run this benchmark for 
either spark-2 or spark-3:
+
+`./gradlew :iceberg-spark[2|3]:jmh 
-PjmhIncludeRegex=IcebergSourceNestedListORCDataWriteBenchmark 
-PjmhOutputPath=benchmark/iceberg-source-nested-list-orc-data-write-benchmark-result.txt`
+
+### VectorizedReadFlatParquetDataBenchmark
+A benchmark to compare performance of reading Parquet data with a flat schema 
using vectorized Iceberg read path and the built-in file source in Spark. To 
run this benchmark for either spark-2 or spark-3:
+
+`./gradlew :iceberg-spark[2|3]:jmh 
-PjmhIncludeRegex=VectorizedReadFlatParquetDataBenchmark 
-PjmhOutputPath=benchmark/vectorized-read-flat-parquet-data-result.txt`
+
+### IcebergSourceFlatParquetDataWriteBenchmark
+A benchmark that evaluates the performance of writing Parquet data with a flat 
schema using Iceberg and the built-in file source in Spark. To run this 
benchmark for either spark-2 or spark-3:
+
+`./gradlew :iceberg-spark[2|3]:jmh 
-PjmhIncludeRegex=IcebergSourceFlatParquetDataWriteBenchmark 
-PjmhOutputPath=benchmark/iceberg-source-flat-parquet-data-write-benchmark-result.txt`
+
+### IcebergSourceNestedAvroDataReadBenchmark
+A benchmark that evaluates the performance of reading Avro data with a flat 
schema using Iceberg and the built-in file source in Spark. To run this 
benchmark for either spark-2 or spark-3:
+
+`./gradlew :iceberg-spark[2|3]:jmh 
-PjmhIncludeRegex=IcebergSourceNestedAvroDataReadBenchmark 
-PjmhOutputPath=benchmark/iceberg-source-nested-avro-data-read-benchmark-result.txt`
+
+### IcebergSourceFlatAvroDataReadBenchmark
+A benchmark that evaluates the performance of reading Avro data with a flat 
schema using Iceberg and the built-in file source in Spark. To run this 
benchmark for either spark-2 or spark-3:
+
+`./gradlew :iceberg-spark[2|3]:jmh 
-PjmhIncludeRegex=IcebergSourceFlatAvroDataReadBenchmark 
-PjmhOutputPath=benchmark/iceberg-source-flat-avro-data-read-benchmark-result.txt`
+
+### IcebergSourceNestedParquetDataWriteBenchmark
+A benchmark that evaluates the performance of writing nested Parquet data 
using Iceberg and the built-in file source in Spark. To run this benchmark for 
either spark-2 or spark-3:
+
+`./gradlew :iceberg-spark[2|3]:jmh 
-PjmhIncludeRegex=IcebergSourceNestedParquetDataWriteBenchmark 
-PjmhOutputPath=benchmark/iceberg-source-nested-parquet-data-write-benchmark-result.txt`
+
+### IcebergSourceNestedParquetDataReadBenchmark
+* A benchmark that evaluates the performance of reading nested Parquet data 
using Iceberg and the built-in file source in Spark. To run this benchmark for 
either spark-2 or spark-3:
+
+` ./gradlew :iceberg-spark[2|3]:jmh 
-PjmhIncludeRegex=IcebergSourceNestedParquetDataReadBenchmark 
-PjmhOutputPath=benchmark/iceberg-source-nested-parquet-data-read-benchmark-result.txt`
+
+### IcebergSourceNestedORCDataReadBenchmark
+A benchmark that evaluates the performance of reading ORC data with a flat 
schema using Iceberg and the built-in file source in Spark. To run this 
benchmark for either spark-2 or spark-3:
+
+`./gradlew :iceberg-spark[2|3]:jmh 
-PjmhIncludeRegex=IcebergSourceNestedORCDataReadBenchmark 
-PjmhOutputPath=benchmark/iceberg-source-nested-orc-data-read-benchmark-result.txt`
+
+### IcebergSourceFlatParquetDataReadBenchmark
+A benchmark that evaluates the performance of reading Parquet data with a flat 
schema using Iceberg and the built-in file source in Spark. To run this 
benchmark for either spark-2 or spark-3:
+
+`./gradlew :iceberg-spark[2|3]:jmh 
-PjmhIncludeRegex=IcebergSourceFlatParquetDataReadBenchmark 
-PjmhOutputPath=benchmark/iceberg-source-flat-parquet-data-read-benchmark-result.txt`
+
+### IcebergSourceFlatParquetDataFilterBenchmark
+A benchmark that evaluates the file skipping capabilities in the Spark data 
source for Iceberg. This class uses a dataset with a flat schema, where the 
records are clustered according to the
+column used in the filter predicate. The performance is compared to the 
built-in file source in Spark. To run this benchmark for either spark-2 or 
spark-3:
+
+`./gradlew :iceberg-spark[2|3]:jmh 
-PjmhIncludeRegex=IcebergSourceFlatParquetDataFilterBenchmark 
-PjmhOutputPath=benchmark/iceberg-source-flat-parquet-data-filter-benchmark-result.txt`
+
+### IcebergSourceNestedParquetDataFilterBenchmark
+A benchmark that evaluates the file skipping capabilities in the Spark data 
source for Iceberg. This class uses a dataset with nested data, where the 
records are clustered according to the
+column used in the filter predicate. The performance is compared to the 
built-in file source in Spark. To run this benchmark for either spark-2 or 
spark-3:
+`./gradlew :iceberg-spark[2|3]:jmh 
-PjmhIncludeRegex=IcebergSourceNestedParquetDataFilterBenchmark 
-PjmhOutputPath=benchmark/iceberg-source-nested-parquet-data-filter-benchmark-result.txt`
+
+### SparkParquetWritersNestedDataBenchmark
+* A benchmark that evaluates the performance of writing nested Parquet data 
using Iceberg and Spark Parquet writers. To run this benchmark for either 
spark-2 or spark-3:
+  `./gradlew :iceberg-spark[2|3]:jmh 
-PjmhIncludeRegex=SparkParquetWritersNestedDataBenchmark 
-PjmhOutputPath=benchmark/spark-parquet-writers-nested-data-benchmark-result.txt`
diff --git a/site/docs/community.md b/site/docs/community.md
index 372a2c0..803a1b7 100644
--- a/site/docs/community.md
+++ b/site/docs/community.md
@@ -84,3 +84,9 @@ Point to 
[intellij-java-palantir-style.xml](../../.baseline/idea/intellij-java-p
 
 See also the IntelliJ [Code Style 
docs](https://www.jetbrains.com/help/idea/copying-code-style-settings.html) and 
[Reformat Code 
docs](https://www.jetbrains.com/help/idea/reformat-and-rearrange-code.html) for 
additional details.
 
+## Running Benchmarks
+Some PRs/changesets might require running benchmarks to determine whether they 
are affecting the baseline performance. Currently there is 
+no "push a single button to get a performance comparison" solution available, 
therefore one has to run JMH performance tests on their local machine and
+post the results on the PR.
+
+See [Benchmarks](benchmarks.md) for a summary of available benchmarks and how 
to run them.

[iceberg] branch master updated: Docs: Describe available Benchmarks and how to run them (#2767)

Reply via email to