[spark] branch branch-3.4 updated: [SPARK-42767][CONNECT][TESTS] Add a precondition to start connect server fallback with `in-memory` and auto ignored some tests strongly depend on hive

gurwls223 Wed, 15 Mar 2023 21:04:54 -0700

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/branch-3.4 by this push:
     new 9f1e8afbe50 [SPARK-42767][CONNECT][TESTS] Add a precondition to start 
connect server fallback with `in-memory` and auto ignored some tests strongly 
depend on hive
9f1e8afbe50 is described below

commit 9f1e8afbe500b71a8e56047380f850a257d56822
Author: yangjie01 <[email protected]>
AuthorDate: Thu Mar 16 13:04:22 2023 +0900

    [SPARK-42767][CONNECT][TESTS] Add a precondition to start connect server 
fallback with `in-memory` and auto ignored some tests strongly depend on hive
    
    ### What changes were proposed in this pull request?
    This pr adds a precondition before `RemoteSparkSession` starts connect 
server to check whether `spark-hive-**.jar` exists in the 
`assembly/target/scala-*/jars` directory, and will fallback to using 
`spark.sql.catalogImplementation=in-memory` to start the connect server if 
`spark-hive-**.jar` doesn't exist.
    
    When using `spark.sql.catalogImplementation=in-memory` to start connect 
server, some test cases that strongly rely on the hive module will be ignored 
rather than fail rudely.  At the same time, developers can see the following 
message on the terminal:
    
    ```
    [info] ClientE2ETestSuite:
    Will start Spark Connect server with 
`spark.sql.catalogImplementation=in-memory`, some tests that rely on Hive will 
be ignored. If you don't want to skip them:
    1. Test with maven: run `build/mvn install -DskipTests -Phive` before 
testing
    2. Test with sbt: run test with `-Phive` profile
    ```
    
    ### Why are the changes needed?
    Avoid rough failure of connect client module UTs due to lack of 
hive-related dependency.
    
    ### Does this PR introduce _any_ user-facing change?
    No, just for test
    
    ### How was this patch tested?
    - Manual  checked test with `-Phive` is same as before
    - Manual test:
      - Maven
    
    run
    ```
    build/mvn clean install -DskipTests
    build/mvn test -pl connector/connect/client/jvm
    ```
    
    **Before**
    
    ```
    Run completed in 14 seconds, 999 milliseconds.
    Total number of tests run: 684
    Suites: completed 12, aborted 0
    Tests: succeeded 678, failed 6, canceled 0, ignored 1, pending 0
    *** 6 TESTS FAILED ***
    ```
    
    **After**
    
    ```
    Discovery starting.
    Discovery completed in 761 milliseconds.
    Run starting. Expected test count is: 684
    ClientE2ETestSuite:
    Will start Spark Connect server with 
`spark.sql.catalogImplementation=in-memory`, some tests that rely on Hive will 
be ignored. If you don't want to skip them:
    1. Test with maven: run `build/mvn install -DskipTests -Phive` before 
testing
    2. Test with sbt: run test with `-Phive` profile
    ...
    Run completed in 15 seconds, 994 milliseconds.
    Total number of tests run: 682
    Suites: completed 12, aborted 0
    Tests: succeeded 682, failed 0, canceled 2, ignored 1, pending 0
    All tests passed.
    ```
    
      - SBT
    
    run `build/sbt clean "connect-client-jvm/test"`
    
    **Before**
    
    ```
    [info] ClientE2ETestSuite:
    [info] org.apache.spark.sql.ClientE2ETestSuite *** ABORTED *** (1 minute, 3 
seconds)
    [info]   java.lang.RuntimeException: Failed to start the test server on 
port 15960.
    [info]   at 
org.apache.spark.sql.connect.client.util.RemoteSparkSession.beforeAll(RemoteSparkSession.scala:129)
    [info]   at 
org.apache.spark.sql.connect.client.util.RemoteSparkSession.beforeAll$(RemoteSparkSession.scala:120)
    [info]   at 
org.apache.spark.sql.ClientE2ETestSuite.beforeAll(ClientE2ETestSuite.scala:37)
    [info]   at 
org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:212)
    [info]   at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210)
    [info]   at 
org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208)
    [info]   at 
org.apache.spark.sql.ClientE2ETestSuite.run(ClientE2ETestSuite.scala:37)
    [info]   at 
org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:321)
    [info]   at 
org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:517)
    [info]   at sbt.ForkMain$Run.lambda$runTest$1(ForkMain.java:413)
    [info]   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    [info]   at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    [info]   at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    [info]   at java.lang.Thread.run(Thread.java:750)
    
    ```
    
    **After**
    
    ```
    [info] ClientE2ETestSuite:
    Will start Spark Connect server with 
`spark.sql.catalogImplementation=in-memory`, some tests that rely on Hive will 
be ignored. If you don't want to skip them:
    1. Test with maven: run `build/mvn install -DskipTests -Phive` before 
testing
    2. Test with sbt: run test with `-Phive` profile
    ....
    [info] Run completed in 22 seconds, 44 milliseconds.
    [info] Total number of tests run: 682
    [info] Suites: completed 11, aborted 0
    [info] Tests: succeeded 682, failed 0, canceled 2, ignored 1, pending 0
    [info] All tests passed.
    ```
    
    Closes #40389 from LuciferYang/spark-hive-available.
    
    Lead-authored-by: yangjie01 <[email protected]>
    Co-authored-by: YangJie <[email protected]>
    Signed-off-by: Hyukjin Kwon <[email protected]>
    (cherry picked from commit 83b9cbddc0ce1d594b718b061e82c231092db4a7)
    Signed-off-by: Hyukjin Kwon <[email protected]>
---
 .../scala/org/apache/spark/sql/ClientE2ETestSuite.scala    |  2 ++
 .../sql/connect/client/util/IntegrationTestUtils.scala     | 14 +++++++++++---
 .../spark/sql/connect/client/util/RemoteSparkSession.scala | 14 +++++++++++++-
 3 files changed, 26 insertions(+), 4 deletions(-)

diff --git 
a/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/ClientE2ETestSuite.scala
 
b/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/ClientE2ETestSuite.scala
index c948f192c90..605b15123c6 100644
--- 
a/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/ClientE2ETestSuite.scala
+++ 
b/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/ClientE2ETestSuite.scala
@@ -56,6 +56,7 @@ class ClientE2ETestSuite extends RemoteSparkSession with 
SQLHelper {
   }
 
   test("eager execution of sql") {
+    assume(IntegrationTestUtils.isSparkHiveJarAvailable)
     withTable("test_martin") {
       // Fails, because table does not exist.
       assertThrows[StatusRuntimeException] {
@@ -250,6 +251,7 @@ class ClientE2ETestSuite extends RemoteSparkSession with 
SQLHelper {
   // TODO (SPARK-42519): Revisit this test after we can set configs.
   //  e.g. spark.conf.set("spark.sql.catalog.testcat", 
classOf[InMemoryTableCatalog].getName)
   test("writeTo with create") {
+    assume(IntegrationTestUtils.isSparkHiveJarAvailable)
     withTable("myTableV2") {
       // Failed to create as Hive support is required.
       spark.range(3).writeTo("myTableV2").create()
diff --git 
a/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/util/IntegrationTestUtils.scala
 
b/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/util/IntegrationTestUtils.scala
index f27ea614a7e..a98f7e9c13b 100644
--- 
a/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/util/IntegrationTestUtils.scala
+++ 
b/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/util/IntegrationTestUtils.scala
@@ -17,6 +17,7 @@
 package org.apache.spark.sql.connect.client.util
 
 import java.io.File
+import java.nio.file.{Files, Paths}
 
 import scala.util.Properties.versionNumberString
 
@@ -27,14 +28,15 @@ object IntegrationTestUtils {
   // System properties used for testing and debugging
   private val DEBUG_SC_JVM_CLIENT = "spark.debug.sc.jvm.client"
 
-  private[sql] lazy val scalaDir = {
-    val version = versionNumberString.split('.') match {
+  private[sql] lazy val scalaVersion = {
+    versionNumberString.split('.') match {
       case Array(major, minor, _*) => major + "." + minor
       case _ => versionNumberString
     }
-    "scala-" + version
   }
 
+  private[sql] lazy val scalaDir = s"scala-$scalaVersion"
+
   private[sql] lazy val sparkHome: String = {
     if (!(sys.props.contains("spark.test.home") || 
sys.env.contains("SPARK_HOME"))) {
       fail("spark.test.home or SPARK_HOME is not set.")
@@ -49,6 +51,12 @@ object IntegrationTestUtils {
   // scalastyle:on println
   private[connect] def debug(error: Throwable): Unit = if (isDebug) 
error.printStackTrace()
 
+  private[sql] lazy val isSparkHiveJarAvailable: Boolean = {
+    val filePath = s"$sparkHome/assembly/target/$scalaDir/jars/" +
+      s"spark-hive_$scalaVersion-${org.apache.spark.SPARK_VERSION}.jar"
+    Files.exists(Paths.get(filePath))
+  }
+
   /**
    * Find a jar in the Spark project artifacts. It requires a build first 
(e.g. build/sbt package,
    * build/mvn clean install -DskipTests) so that this method can find the jar 
in the target
diff --git 
a/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/util/RemoteSparkSession.scala
 
b/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/util/RemoteSparkSession.scala
index beae5bfa27e..d1a34603f48 100644
--- 
a/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/util/RemoteSparkSession.scala
+++ 
b/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/util/RemoteSparkSession.scala
@@ -62,6 +62,18 @@ object SparkConnectServerUtils {
       "connector/connect/server",
       "spark-connect-assembly",
       "spark-connect").getCanonicalPath
+    val catalogImplementation = if 
(IntegrationTestUtils.isSparkHiveJarAvailable) {
+      "hive"
+    } else {
+      // scalastyle:off println
+      println(
+        "Will start Spark Connect server with 
`spark.sql.catalogImplementation=in-memory`, " +
+          "some tests that rely on Hive will be ignored. If you don't want to 
skip them:\n" +
+          "1. Test with maven: run `build/mvn install -DskipTests -Phive` 
before testing\n" +
+          "2. Test with sbt: run test with `-Phive` profile")
+      // scalastyle:on println
+      "in-memory"
+    }
     val builder = Process(
       Seq(
         "bin/spark-submit",
@@ -72,7 +84,7 @@ object SparkConnectServerUtils {
         "--conf",
         
"spark.sql.catalog.testcat=org.apache.spark.sql.connect.catalog.InMemoryTableCatalog",
         "--conf",
-        "spark.sql.catalogImplementation=hive",
+        s"spark.sql.catalogImplementation=$catalogImplementation",
         "--class",
         "org.apache.spark.sql.connect.SimpleSparkConnectService",
         jar),


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[spark] branch branch-3.4 updated: [SPARK-42767][CONNECT][TESTS] Add a precondition to start connect server fallback with `in-memory` and auto ignored some tests strongly depend on hive

Reply via email to