parthchandra commented on code in PR #863:
URL: https://github.com/apache/datafusion-comet/pull/863#discussion_r1728232376
##########
spark/src/test/scala/org/apache/spark/sql/benchmark/CometTPCDSMicroBenchmark.scala:
##########
@@ -104,15 +104,27 @@ object CometTPCDSMicroBenchmark extends
CometTPCQueryBenchmarkBase {
}
val numRows = queryRelations.map(tableSizes.getOrElse(_, 0L)).sum
val benchmark = new Benchmark(benchmarkName, numRows, 2, output = output)
- benchmark.addCase(s"$name$nameSuffix") { _ =>
+ benchmark.addCase(s"$name$nameSuffix: Spark Scan + Spark Exec") { _ =>
cometSpark.sql(queryString).noop()
}
- benchmark.addCase(s"$name$nameSuffix: Comet (Scan)") { _ =>
+ benchmark.addCase(s"$name$nameSuffix: Comet Scan + Spark Exec") { _ =>
withSQLConf(CometConf.COMET_ENABLED.key -> "true") {
cometSpark.sql(queryString).noop()
}
}
- benchmark.addCase(s"$name$nameSuffix: Comet (Scan, Exec)") { _ =>
+ benchmark.addCase(s"$name$nameSuffix: Comet Scan + Comet Exec") { _ =>
Review Comment:
As long as the name remains consistent across commits once we start
recording the benchmark results, it shouldn't matter.
Here's an [example from
Spark](https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/AggregateBenchmark.scala)
btw.
But if it helps to standardize, sure.
##########
spark/src/test/scala/org/apache/spark/sql/benchmark/CometAggregateBenchmark.scala:
##########
@@ -28,7 +28,7 @@ import org.apache.comet.CometConf
/**
* Benchmark to measure Comet execution performance. To run this benchmark:
* {{{
- * SPARK_GENERATE_BENCHMARK_FILES=1 make
benchmark-org.apache.spark.sql.benchmark.CometAggregateBenchmark
+ * make benchmark-org.apache.spark.sql.benchmark.CometAggregateBenchmark
Review Comment:
At some point, like Spark, we will enable this for Comet so that updates to
benchmark results are added as part of the PR and then we can track changes to
the benchmark results.
##########
spark/src/test/scala/org/apache/spark/sql/benchmark/CometAggregateBenchmark.scala:
##########
@@ -28,7 +28,7 @@ import org.apache.comet.CometConf
/**
* Benchmark to measure Comet execution performance. To run this benchmark:
* {{{
- * SPARK_GENERATE_BENCHMARK_FILES=1 make
benchmark-org.apache.spark.sql.benchmark.CometAggregateBenchmark
+ * make benchmark-org.apache.spark.sql.benchmark.CometAggregateBenchmark
Review Comment:
This env variable is used in Spark's
[BenchmarkBase](https://github.com/apache/spark/blob/97e0a88b1d04a8e33206e5cb8384878ca4cdfc5a/core/src/test/scala/org/apache/spark/benchmark/BenchmarkBase.scala#L51)
which is the parent of this (and other benchmarks).
The effect is to write the output to a file under the `spark/benchmarks`.
In Spark, for any PR that has a performance impact, the contributor can run
the benchmarks in their github environment and [include the benchmark
results](https://spark.apache.org/developer-tools.html#github-workflow-benchmarks)
as part of the PR. This enables reviewers to verify the performance against
previously run benchmarks. Since the benchmarks are run on github, all users
end up running the benchmarks on the same hardware and environment and so the
results are actually comparable.
Note that this is also the reason why, once a benchmark case has been given
a name, it should not be changed. The benchmark case has a name solely to allow
this comparison.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]