[spark] branch branch-3.2 updated: [SPARK-39856][SQL][TESTS] Increase the number of partitions in TPC-DS build to avoid out-of-memory

gurwls223 Sun, 24 Jul 2022 20:48:33 -0700

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.2
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/branch-3.2 by this push:
     new e7aa9671276 [SPARK-39856][SQL][TESTS] Increase the number of 
partitions in TPC-DS build to avoid out-of-memory
e7aa9671276 is described below

commit e7aa9671276c435cca851fc53931db302b64bbac
Author: Hyukjin Kwon <[email protected]>
AuthorDate: Mon Jul 25 12:44:54 2022 +0900

    [SPARK-39856][SQL][TESTS] Increase the number of partitions in TPC-DS build 
to avoid out-of-memory
    
    This PR proposes to avoid out-of-memory in TPC-DS build at GitHub Actions 
CI by:
    
    - Increasing the number of partitions being used in shuffle.
    - Truncating precisions after 10th in floats.
        The number of partitions was previously set to 1 because of different 
results in precisions that generally we can just ignore.
    - Sort the results regardless of join type since Apache Spark does not 
guarantee the order of results
    
    One of the reasons for the large memory usage seems to be single partition 
that's being used in the shuffle.
    
    No, test-only.
    
    GitHub Actions in this CI will test it out.
    
    Closes #37270 from HyukjinKwon/deflake-tpcds.
    
    Authored-by: Hyukjin Kwon <[email protected]>
    Signed-off-by: Hyukjin Kwon <[email protected]>
    (cherry picked from commit 7358253755762f9bfe6cedc1a50ec14616cfeace)
    Signed-off-by: Hyukjin Kwon <[email protected]>
---
 .../test/scala/org/apache/spark/sql/TPCDSQueryTestSuite.scala    | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/TPCDSQueryTestSuite.scala 
b/sql/core/src/test/scala/org/apache/spark/sql/TPCDSQueryTestSuite.scala
index 952e8968020..4b1a2626ef4 100644
--- a/sql/core/src/test/scala/org/apache/spark/sql/TPCDSQueryTestSuite.scala
+++ b/sql/core/src/test/scala/org/apache/spark/sql/TPCDSQueryTestSuite.scala
@@ -58,7 +58,7 @@ class TPCDSQueryTestSuite extends QueryTest with TPCDSBase 
with SQLQueryTestHelp
 
   // To make output results deterministic
   protected override def sparkConf: SparkConf = super.sparkConf
-    .set(SQLConf.SHUFFLE_PARTITIONS.key, "1")
+    .set(SQLConf.SHUFFLE_PARTITIONS.key, 4.toString)
 
   protected override def createSparkSession: TestSparkSession = {
     new TestSparkSession(new SparkContext("local[1]", 
this.getClass.getSimpleName, sparkConf))
@@ -100,7 +100,9 @@ class TPCDSQueryTestSuite extends QueryTest with TPCDSBase 
with SQLQueryTestHelp
   private def runQuery(query: String, goldenFile: File): Unit = {
     val (schema, output) = handleExceptions(getNormalizedResult(spark, query))
     val queryString = query.trim
-    val outputString = output.mkString("\n").replaceAll("\\s+$", "")
+    val outputString = output.mkString("\n")
+      .replaceAll("\\s+$", "")
+      .replaceAll("""([0-9]+.[0-9]{10})([0-9]*)""", "$1")
     if (regenerateGoldenFiles) {
       val goldenOutput = {
         s"-- Automatically generated by ${getClass.getSimpleName}\n\n" +
@@ -127,7 +129,8 @@ class TPCDSQueryTestSuite extends QueryTest with TPCDSBase 
with SQLQueryTestHelp
         s"Expected 3 blocks in result file but got ${segments.size}. " +
           "Try regenerate the result files.")
 
-      (segments(1).trim, segments(2).replaceAll("\\s+$", ""))
+      (segments(1).trim, segments(2)
+        .replaceAll("\\s+$", "").replaceAll("""([0-9]+.[0-9]{10})([0-9]*)""", 
"$1"))
     }
 
     assertResult(expectedSchema, s"Schema did not match\n$queryString") { 
schema }


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[spark] branch branch-3.2 updated: [SPARK-39856][SQL][TESTS] Increase the number of partitions in TPC-DS build to avoid out-of-memory

Reply via email to