This is an automated email from the ASF dual-hosted git repository.
gurwls223 pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/branch-3.4 by this push:
new 9f1e8afbe50 [SPARK-42767][CONNECT][TESTS] Add a precondition to start
connect server fallback with `in-memory` and auto ignored some tests strongly
depend on hive
9f1e8afbe50 is described below
commit 9f1e8afbe500b71a8e56047380f850a257d56822
Author: yangjie01 <[email protected]>
AuthorDate: Thu Mar 16 13:04:22 2023 +0900
[SPARK-42767][CONNECT][TESTS] Add a precondition to start connect server
fallback with `in-memory` and auto ignored some tests strongly depend on hive
### What changes were proposed in this pull request?
This pr adds a precondition before `RemoteSparkSession` starts connect
server to check whether `spark-hive-**.jar` exists in the
`assembly/target/scala-*/jars` directory, and will fallback to using
`spark.sql.catalogImplementation=in-memory` to start the connect server if
`spark-hive-**.jar` doesn't exist.
When using `spark.sql.catalogImplementation=in-memory` to start connect
server, some test cases that strongly rely on the hive module will be ignored
rather than fail rudely. At the same time, developers can see the following
message on the terminal:
```
[info] ClientE2ETestSuite:
Will start Spark Connect server with
`spark.sql.catalogImplementation=in-memory`, some tests that rely on Hive will
be ignored. If you don't want to skip them:
1. Test with maven: run `build/mvn install -DskipTests -Phive` before
testing
2. Test with sbt: run test with `-Phive` profile
```
### Why are the changes needed?
Avoid rough failure of connect client module UTs due to lack of
hive-related dependency.
### Does this PR introduce _any_ user-facing change?
No, just for test
### How was this patch tested?
- Manual checked test with `-Phive` is same as before
- Manual test:
- Maven
run
```
build/mvn clean install -DskipTests
build/mvn test -pl connector/connect/client/jvm
```
**Before**
```
Run completed in 14 seconds, 999 milliseconds.
Total number of tests run: 684
Suites: completed 12, aborted 0
Tests: succeeded 678, failed 6, canceled 0, ignored 1, pending 0
*** 6 TESTS FAILED ***
```
**After**
```
Discovery starting.
Discovery completed in 761 milliseconds.
Run starting. Expected test count is: 684
ClientE2ETestSuite:
Will start Spark Connect server with
`spark.sql.catalogImplementation=in-memory`, some tests that rely on Hive will
be ignored. If you don't want to skip them:
1. Test with maven: run `build/mvn install -DskipTests -Phive` before
testing
2. Test with sbt: run test with `-Phive` profile
...
Run completed in 15 seconds, 994 milliseconds.
Total number of tests run: 682
Suites: completed 12, aborted 0
Tests: succeeded 682, failed 0, canceled 2, ignored 1, pending 0
All tests passed.
```
- SBT
run `build/sbt clean "connect-client-jvm/test"`
**Before**
```
[info] ClientE2ETestSuite:
[info] org.apache.spark.sql.ClientE2ETestSuite *** ABORTED *** (1 minute, 3
seconds)
[info] java.lang.RuntimeException: Failed to start the test server on
port 15960.
[info] at
org.apache.spark.sql.connect.client.util.RemoteSparkSession.beforeAll(RemoteSparkSession.scala:129)
[info] at
org.apache.spark.sql.connect.client.util.RemoteSparkSession.beforeAll$(RemoteSparkSession.scala:120)
[info] at
org.apache.spark.sql.ClientE2ETestSuite.beforeAll(ClientE2ETestSuite.scala:37)
[info] at
org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:212)
[info] at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210)
[info] at
org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208)
[info] at
org.apache.spark.sql.ClientE2ETestSuite.run(ClientE2ETestSuite.scala:37)
[info] at
org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:321)
[info] at
org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:517)
[info] at sbt.ForkMain$Run.lambda$runTest$1(ForkMain.java:413)
[info] at java.util.concurrent.FutureTask.run(FutureTask.java:266)
[info] at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[info] at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[info] at java.lang.Thread.run(Thread.java:750)
```
**After**
```
[info] ClientE2ETestSuite:
Will start Spark Connect server with
`spark.sql.catalogImplementation=in-memory`, some tests that rely on Hive will
be ignored. If you don't want to skip them:
1. Test with maven: run `build/mvn install -DskipTests -Phive` before
testing
2. Test with sbt: run test with `-Phive` profile
....
[info] Run completed in 22 seconds, 44 milliseconds.
[info] Total number of tests run: 682
[info] Suites: completed 11, aborted 0
[info] Tests: succeeded 682, failed 0, canceled 2, ignored 1, pending 0
[info] All tests passed.
```
Closes #40389 from LuciferYang/spark-hive-available.
Lead-authored-by: yangjie01 <[email protected]>
Co-authored-by: YangJie <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
(cherry picked from commit 83b9cbddc0ce1d594b718b061e82c231092db4a7)
Signed-off-by: Hyukjin Kwon <[email protected]>
---
.../scala/org/apache/spark/sql/ClientE2ETestSuite.scala | 2 ++
.../sql/connect/client/util/IntegrationTestUtils.scala | 14 +++++++++++---
.../spark/sql/connect/client/util/RemoteSparkSession.scala | 14 +++++++++++++-
3 files changed, 26 insertions(+), 4 deletions(-)
diff --git
a/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/ClientE2ETestSuite.scala
b/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/ClientE2ETestSuite.scala
index c948f192c90..605b15123c6 100644
---
a/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/ClientE2ETestSuite.scala
+++
b/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/ClientE2ETestSuite.scala
@@ -56,6 +56,7 @@ class ClientE2ETestSuite extends RemoteSparkSession with
SQLHelper {
}
test("eager execution of sql") {
+ assume(IntegrationTestUtils.isSparkHiveJarAvailable)
withTable("test_martin") {
// Fails, because table does not exist.
assertThrows[StatusRuntimeException] {
@@ -250,6 +251,7 @@ class ClientE2ETestSuite extends RemoteSparkSession with
SQLHelper {
// TODO (SPARK-42519): Revisit this test after we can set configs.
// e.g. spark.conf.set("spark.sql.catalog.testcat",
classOf[InMemoryTableCatalog].getName)
test("writeTo with create") {
+ assume(IntegrationTestUtils.isSparkHiveJarAvailable)
withTable("myTableV2") {
// Failed to create as Hive support is required.
spark.range(3).writeTo("myTableV2").create()
diff --git
a/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/util/IntegrationTestUtils.scala
b/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/util/IntegrationTestUtils.scala
index f27ea614a7e..a98f7e9c13b 100644
---
a/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/util/IntegrationTestUtils.scala
+++
b/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/util/IntegrationTestUtils.scala
@@ -17,6 +17,7 @@
package org.apache.spark.sql.connect.client.util
import java.io.File
+import java.nio.file.{Files, Paths}
import scala.util.Properties.versionNumberString
@@ -27,14 +28,15 @@ object IntegrationTestUtils {
// System properties used for testing and debugging
private val DEBUG_SC_JVM_CLIENT = "spark.debug.sc.jvm.client"
- private[sql] lazy val scalaDir = {
- val version = versionNumberString.split('.') match {
+ private[sql] lazy val scalaVersion = {
+ versionNumberString.split('.') match {
case Array(major, minor, _*) => major + "." + minor
case _ => versionNumberString
}
- "scala-" + version
}
+ private[sql] lazy val scalaDir = s"scala-$scalaVersion"
+
private[sql] lazy val sparkHome: String = {
if (!(sys.props.contains("spark.test.home") ||
sys.env.contains("SPARK_HOME"))) {
fail("spark.test.home or SPARK_HOME is not set.")
@@ -49,6 +51,12 @@ object IntegrationTestUtils {
// scalastyle:on println
private[connect] def debug(error: Throwable): Unit = if (isDebug)
error.printStackTrace()
+ private[sql] lazy val isSparkHiveJarAvailable: Boolean = {
+ val filePath = s"$sparkHome/assembly/target/$scalaDir/jars/" +
+ s"spark-hive_$scalaVersion-${org.apache.spark.SPARK_VERSION}.jar"
+ Files.exists(Paths.get(filePath))
+ }
+
/**
* Find a jar in the Spark project artifacts. It requires a build first
(e.g. build/sbt package,
* build/mvn clean install -DskipTests) so that this method can find the jar
in the target
diff --git
a/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/util/RemoteSparkSession.scala
b/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/util/RemoteSparkSession.scala
index beae5bfa27e..d1a34603f48 100644
---
a/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/util/RemoteSparkSession.scala
+++
b/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/util/RemoteSparkSession.scala
@@ -62,6 +62,18 @@ object SparkConnectServerUtils {
"connector/connect/server",
"spark-connect-assembly",
"spark-connect").getCanonicalPath
+ val catalogImplementation = if
(IntegrationTestUtils.isSparkHiveJarAvailable) {
+ "hive"
+ } else {
+ // scalastyle:off println
+ println(
+ "Will start Spark Connect server with
`spark.sql.catalogImplementation=in-memory`, " +
+ "some tests that rely on Hive will be ignored. If you don't want to
skip them:\n" +
+ "1. Test with maven: run `build/mvn install -DskipTests -Phive`
before testing\n" +
+ "2. Test with sbt: run test with `-Phive` profile")
+ // scalastyle:on println
+ "in-memory"
+ }
val builder = Process(
Seq(
"bin/spark-submit",
@@ -72,7 +84,7 @@ object SparkConnectServerUtils {
"--conf",
"spark.sql.catalog.testcat=org.apache.spark.sql.connect.catalog.InMemoryTableCatalog",
"--conf",
- "spark.sql.catalogImplementation=hive",
+ s"spark.sql.catalogImplementation=$catalogImplementation",
"--class",
"org.apache.spark.sql.connect.SimpleSparkConnectService",
jar),
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]