(kyuubi) branch master updated: [KYUUBI #5925] Kyuubi TPC-DS support running benchmark with skipping some queries

chengpan Thu, 28 Dec 2023 01:02:01 -0800

This is an automated email from the ASF dual-hosted git repository.

chengpan pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/kyuubi.git



The following commit(s) were added to refs/heads/master by this push:
     new e49af52f4 [KYUUBI #5925] Kyuubi TPC-DS support running benchmark with 
skipping some queries
e49af52f4 is described below

commit e49af52f4c6e59a9659bfebf39f44e620c1fe232
Author: haorenhui <[email protected]>
AuthorDate: Thu Dec 28 17:01:46 2023 +0800

    [KYUUBI #5925] Kyuubi TPC-DS support running benchmark with skipping some 
queries
    
    # :mag: Description
    ## Issue References 🔗
    
    When running Kyuubi's TPCDS, some SQL runs slowly, but there are no 
parameters to skip it.
    
    ## Describe Your Solution 🔧
    
    Add the skip parameter, specifying a comma-separated list of SQL
    
    ## Types of changes :bookmark:
    
    - [ ] Bugfix (non-breaking change which fixes an issue)
    - [ ] New feature (non-breaking change which adds functionality)
    - [x] Breaking change (fix or feature that would cause existing 
functionality to change)
    
    ## Test Plan 🧪
    
    #### Behavior Without This Pull Request :coffin:
    no parameters to skip it.
    
    #### Behavior With This Pull Request :tada:
    ```
    $SPARK_HOME/bin/spark-submit \
      --class org.apache.kyuubi.tpcds.benchmark.RunBenchmark \
      kyuubi-tpcds_*.jar --db tpcds_sf10 --exclude q2,q4
    ```
    
    > == QUERY LIST ==
    > q1-v2.4
    > q3-v2.4
    > q5-v2.4
    > q6-v2.4
    > q7-v2.4
    > q8-v2.4
    > q9-v2.4
    > .....
    
    #### Related Unit Tests
    
    ---
    
    # Checklists
    ## 📝 Author Self Checklist
    
    - [x] My code follows the [style 
guidelines](https://kyuubi.readthedocs.io/en/master/contributing/code/style.html)
 of this project
    - [x] I have performed a self-review
    - [x] I have commented my code, particularly in hard-to-understand areas
    - [x] I have made corresponding changes to the documentation
    - [ ] My changes generate no new warnings
    - [ ] I have added tests that prove my fix is effective or that my feature 
works
    - [ ] New and existing unit tests pass locally with my changes
    - [ ] This patch was not authored or co-authored using [Generative 
Tooling](https://www.apache.org/legal/generative-tooling.html)
    
    ## 📝 Committer Pre-Merge Checklist
    
    - [ ] Pull request title is okay.
    - [ ] No license issues.
    - [ ] Milestone correctly set?
    - [ ] Test coverage is ok
    - [ ] Assignees are selected.
    - [ ] Minimum number of approvals
    - [ ] No changes are requested
    
    **Be nice. Be informative.**
    
    Closes #5925 from rhh777/tpcds-support-skip-queries.
    
    Closes #5925
    
    682f30ce8 [haorenhui] Update some descriptions
    cd90fb597 [haorenhui] Use include(list) and exclude(list) to replace 
filter(string)/queries(list)/skip(list)
    13744e57e [haorenhui] kyuubi tpcds RunBenchmark support skip some of the 
queries
    
    Authored-by: haorenhui <[email protected]>
    Signed-off-by: Cheng Pan <[email protected]>
---
 dev/kyuubi-tpcds/README.md                         | 62 +++++++++++++++++-----
 .../kyuubi/tpcds/benchmark/RunBenchmark.scala      | 37 ++++++-------
 2 files changed, 68 insertions(+), 31 deletions(-)

diff --git a/dev/kyuubi-tpcds/README.md b/dev/kyuubi-tpcds/README.md
index a9a6487aa..717c1b0ed 100644
--- a/dev/kyuubi-tpcds/README.md
+++ b/dev/kyuubi-tpcds/README.md
@@ -48,14 +48,15 @@ $SPARK_HOME/bin/spark-submit \
 
 Support options:
 
-|     key     |        default         |                          description  
                        |
-|-------------|------------------------|---------------------------------------------------------------|
-| db          | none(required)         | the TPC-DS database                   
                        |
-| benchmark   | tpcds-v2.4-benchmark   | the name of application               
                        |
-| iterations  | 3                      | the number of iterations to run       
                        |
-| breakdown   | false                  | whether to record breakdown results 
of an execution           |
-| filter      | a                      | filter on the name of the queries to 
run, e.g. q1-v2.4        |
-| results-dir | /spark/sql/performance | dir to store benchmark results, e.g. 
hdfs://hdfs-nn:9870/pref |
+| key         | default                | description                           
                                        |
+|-------------|------------------------|-------------------------------------------------------------------------------|
+| db          | none(required)         | the TPC-DS database                   
                                        |
+| benchmark   | tpcds-v2.4-benchmark   | the name of application               
                                        |
+| iterations  | 3                      | the number of iterations to run       
                                        |
+| breakdown   | false                  | whether to record breakdown results 
of an execution                           |
+| results-dir | /spark/sql/performance | dir to store benchmark results, e.g. 
hdfs://hdfs-nn:9870/pref                 |
+| include     | none(optional)         | name of the queries to run, use comma 
to split multiple names, e.g. q1,q2     |
+| exclude     | none(optional)         | name of the queries to exclude, use 
comma to split multiple names, e.g. q2,q4 |
 
 Example: the following command to benchmark TPC-DS sf10 with exists database 
`tpcds_sf10`.
 
@@ -65,17 +66,52 @@ $SPARK_HOME/bin/spark-submit \
   kyuubi-tpcds_*.jar --db tpcds_sf10
 ```
 
-We also support run one of the TPC-DS query:
+We also support run specified SQL collections of the TPC-DS query:
 
 ```shell
 $SPARK_HOME/bin/spark-submit \
   --class org.apache.kyuubi.tpcds.benchmark.RunBenchmark \
-  kyuubi-tpcds_*.jar --db tpcds_sf10 --filter q1-v2.4
+  kyuubi-tpcds_*.jar --db tpcds_sf10 --include q1,q2
 ```
 
 The result of TPC-DS benchmark like:
 
-|  name   | minTimeMs | maxTimeMs  | avgTimeMs  |  stdDev  | stdDevPercent  |
-|---------|-----------|------------|------------|----------|----------------|
-| q1-v2.4 | 50.522384 | 868.010383 | 323.398267 | 471.6482 | 145.8413108576 |
+| name    | minTimeMs    | maxTimeMs    | avgTimeMs        | stdDev           
| stdDevPercent    |
+|---------|--------------|--------------|------------------|------------------|------------------|
+| q1-v2.4 | 8329.884508  | 14159.307004 | 10537.235825     | 3161.74253777417 
| 30.0054263782615 |
+| q2-v2.4 | 16600.979609 | 18932.613523 | 18137.6516166666 | 1331.06332796139 
| 7.33867512781137 |
 
+If you want to exclude some SQL, you can use exclude:
+
+```shell
+$SPARK_HOME/bin/spark-submit \
+  --class org.apache.kyuubi.tpcds.benchmark.RunBenchmark \
+  kyuubi-tpcds_*.jar --db tpcds_sf10 --exclude q2,q4
+```
+
+The result of TPC-DS benchmark like:
+
+| name     | minTimeMs    | maxTimeMs    | avgTimeMs        | stdDev           
| stdDevPercent     |
+|----------|--------------|--------------|------------------|------------------|-------------------|
+| q1-v2.4  | 8329.884508  | 14159.307004 | 10537.235825     | 3161.74253777417 
| 30.0054263782615  |
+| q3-v2.4  | 3841.009061  | 4685.16345   | 4128.583224      | 482.102016761038 
| 11.6771781166603  |
+| q5-v2.4  | 39405.654981 | 48845.359253 | 43530.6847113333 | 4830.98802198401 
| 11.0978911864583  |
+| q6-v2.4  | 2998.962221  | 7793.096796  | 4658.37355366666 | 2716.310089792   
| 58.3102677039276  |
+| ...      | ...          | ...          | ...              | ...              
| ...               |
+| q99-v2.4 | 11747.22389  | 11900.570288 | 11813.018609     | 78.9544389266673 
| 0.668368022941351 |
+
+When both include and exclude exist simultaneously, the final SQL collections 
executed is include minus exclude:
+
+```shell
+$SPARK_HOME/bin/spark-submit \
+  --class org.apache.kyuubi.tpcds.benchmark.RunBenchmark \
+  kyuubi-tpcds_*.jar --db tpcds_sf10 --include q1,q2,q3,q4,q5 --exclude q2,q4
+```
+
+The result of TPC-DS benchmark like:
+
+| name     | minTimeMs    | maxTimeMs    | avgTimeMs        | stdDev           
| stdDevPercent     |
+|----------|--------------|--------------|------------------|------------------|-------------------|
+| q1-v2.4  | 8329.884508  | 14159.307004 | 10537.235825     | 3161.74253777417 
| 30.0054263782615  |
+| q3-v2.4  | 3841.009061  | 4685.16345   | 4128.583224      | 482.102016761038 
| 11.6771781166603  |
+| q5-v2.4  | 39405.654981 | 48845.359253 | 43530.6847113333 | 4830.98802198401 
| 11.0978911864583  |
\ No newline at end of file
diff --git 
a/dev/kyuubi-tpcds/src/main/scala/org/apache/kyuubi/tpcds/benchmark/RunBenchmark.scala
 
b/dev/kyuubi-tpcds/src/main/scala/org/apache/kyuubi/tpcds/benchmark/RunBenchmark.scala
index 3e2106cff..80f742294 100644
--- 
a/dev/kyuubi-tpcds/src/main/scala/org/apache/kyuubi/tpcds/benchmark/RunBenchmark.scala
+++ 
b/dev/kyuubi-tpcds/src/main/scala/org/apache/kyuubi/tpcds/benchmark/RunBenchmark.scala
@@ -26,11 +26,11 @@ import org.apache.spark.sql.functions._
 case class RunConfig(
     db: String = null,
     benchmarkName: String = "tpcds-v2.4-benchmark",
-    filter: Option[String] = None,
     iterations: Int = 3,
     breakdown: Boolean = false,
     resultsDir: String = "/spark/sql/performance",
-    queries: Set[String] = Set.empty)
+    include: Set[String] = Set.empty,
+    exclude: Set[String] = Set.empty)
 
 // scalastyle:off
 /**
@@ -54,9 +54,6 @@ object RunBenchmark {
       opt[String]('b', "benchmark")
         .action { (x, c) => c.copy(benchmarkName = x) }
         .text("the name of the benchmark to run")
-      opt[String]('f', "filter")
-        .action((x, c) => c.copy(filter = Some(x)))
-        .text("a filter on the name of the queries to run")
       opt[Boolean]('B', "breakdown")
         .action((x, c) => c.copy(breakdown = x))
         .text("whether to record breakdown results of an execution")
@@ -66,11 +63,16 @@ object RunBenchmark {
       opt[String]('r', "results-dir")
         .action((x, c) => c.copy(resultsDir = x))
         .text("dir to store benchmark results, e.g. hdfs://hdfs-nn:9870/pref")
-      opt[String]('q', "queries")
+      opt[String]("include")
         .action { case (x, c) =>
-          c.copy(queries = x.split(",").map(_.trim).filter(_.nonEmpty).toSet)
+          c.copy(include = x.split(",").map(_.trim).filter(_.nonEmpty).toSet)
         }
-        .text("name of the queries to run, use , split multiple name")
+        .text("name of the queries to run, use comma to split multiple names, 
e.g. q1,q2")
+      opt[String]("exclude")
+        .action { case (x, c) =>
+          c.copy(exclude = x.split(",").map(_.trim).filter(_.nonEmpty).toSet)
+        }
+        .text("name of the queries to exclude, use comma to split multiple 
names, e.g. q2,q4")
       help("help")
         .text("prints this usage text")
     }
@@ -96,19 +98,18 @@ object RunBenchmark {
     println(config.db)
     sparkSession.sql(s"use ${config.db}")
 
-    val allQueries = config.filter.map { f =>
-      benchmark.tpcds2_4Queries.filter(_.name contains f)
-    } getOrElse {
-      benchmark.tpcds2_4Queries
-    }
-
-    val runQueries =
-      if (config.queries.nonEmpty) {
-        allQueries.filter(q => config.queries.contains(q.name.split('-')(0)))
+    var runQueries =
+      if (config.include.nonEmpty) {
+        benchmark.tpcds2_4Queries.filter(q => 
config.include.contains(q.name.split('-')(0)))
       } else {
-        allQueries
+        benchmark.tpcds2_4Queries
       }
 
+    // runQueries = include - exclude
+    if (config.exclude.nonEmpty) {
+      runQueries = runQueries.filterNot(q => 
config.exclude.contains(q.name.split('-')(0)))
+    }
+
     println("== QUERY LIST ==")
     runQueries.foreach(q => println(q.name))

(kyuubi) branch master updated: [KYUUBI #5925] Kyuubi TPC-DS support running benchmark with skipping some queries

Reply via email to