Re: [PR] perf: Add benchmarks for Spark Scan + Comet Exec [datafusion-comet]

via GitHub Thu, 22 Aug 2024 21:09:51 -0700


parthchandra commented on code in PR #863:
URL: https://github.com/apache/datafusion-comet/pull/863#discussion_r1728232376



##########
spark/src/test/scala/org/apache/spark/sql/benchmark/CometTPCDSMicroBenchmark.scala:
##########
@@ -104,15 +104,27 @@ object CometTPCDSMicroBenchmark extends 
CometTPCQueryBenchmarkBase {
       }
       val numRows = queryRelations.map(tableSizes.getOrElse(_, 0L)).sum
       val benchmark = new Benchmark(benchmarkName, numRows, 2, output = output)
-      benchmark.addCase(s"$name$nameSuffix") { _ =>
+      benchmark.addCase(s"$name$nameSuffix: Spark Scan + Spark Exec") { _ =>
         cometSpark.sql(queryString).noop()
       }
-      benchmark.addCase(s"$name$nameSuffix: Comet (Scan)") { _ =>
+      benchmark.addCase(s"$name$nameSuffix: Comet Scan + Spark Exec") { _ =>
         withSQLConf(CometConf.COMET_ENABLED.key -> "true") {
           cometSpark.sql(queryString).noop()
         }
       }
-      benchmark.addCase(s"$name$nameSuffix: Comet (Scan, Exec)") { _ =>
+      benchmark.addCase(s"$name$nameSuffix: Comet Scan + Comet Exec") { _ =>

Review Comment:
   As long as the name remains consistent across commits once we start 
recording the benchmark results, it shouldn't matter.  
   Here's an [example from 
Spark](https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/AggregateBenchmark.scala)
 btw.
   But if it helps to standardize, sure.



##########
spark/src/test/scala/org/apache/spark/sql/benchmark/CometAggregateBenchmark.scala:
##########
@@ -28,7 +28,7 @@ import org.apache.comet.CometConf
 /**
  * Benchmark to measure Comet execution performance. To run this benchmark:
  * {{{
- *   SPARK_GENERATE_BENCHMARK_FILES=1 make 
benchmark-org.apache.spark.sql.benchmark.CometAggregateBenchmark
+ *   make benchmark-org.apache.spark.sql.benchmark.CometAggregateBenchmark

Review Comment:
   At some point, like Spark, we will enable this for Comet so that updates to 
benchmark results are added as part of the PR and then we can track changes to 
the benchmark results. 



##########
spark/src/test/scala/org/apache/spark/sql/benchmark/CometAggregateBenchmark.scala:
##########
@@ -28,7 +28,7 @@ import org.apache.comet.CometConf
 /**
  * Benchmark to measure Comet execution performance. To run this benchmark:
  * {{{
- *   SPARK_GENERATE_BENCHMARK_FILES=1 make 
benchmark-org.apache.spark.sql.benchmark.CometAggregateBenchmark
+ *   make benchmark-org.apache.spark.sql.benchmark.CometAggregateBenchmark

Review Comment:
   This env variable is used in Spark's 
[BenchmarkBase](https://github.com/apache/spark/blob/97e0a88b1d04a8e33206e5cb8384878ca4cdfc5a/core/src/test/scala/org/apache/spark/benchmark/BenchmarkBase.scala#L51)
 which is the parent of this (and other benchmarks).
   The effect is to write the output to a file under the `spark/benchmarks`.
   In Spark, for any PR that has a performance impact, the contributor can run 
the benchmarks in their github environment and [include the benchmark 
results](https://spark.apache.org/developer-tools.html#github-workflow-benchmarks)
 as part of the PR. This enables reviewers to verify the performance against 
previously run benchmarks. Since the benchmarks are run on github, all users 
end up running the benchmarks on the same hardware and environment and so the 
results are actually comparable.
   Note that this is also the reason why, once a benchmark case has been given 
a name, it should not be changed. The benchmark case has a name solely to allow 
this comparison.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] perf: Add benchmarks for Spark Scan + Comet Exec [datafusion-comet]

Reply via email to