This is an automated email from the ASF dual-hosted git repository.
dongjoon pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/branch-3.5 by this push:
new f888d5736801 [SPARK-46525][SQL][TESTS][3.5] Fix
`docker-integration-tests` on Apple Silicon
f888d5736801 is described below
commit f888d57368012af552965f91c548277365f3c369
Author: Kent Yao <[email protected]>
AuthorDate: Fri Sep 27 15:42:00 2024 -0700
[SPARK-46525][SQL][TESTS][3.5] Fix `docker-integration-tests` on Apple
Silicon
### What changes were proposed in this pull request?
This is a merged backport of SPARK-46525 with the original authorship,
yaooqinn .
- #44509
- #44612
- #45303
`com.spotify.docker.client` is not going to support Apple Silicons as it
has already been archived and the
[jnr-unixsocket](https://mvnrepository.com/artifact/com.github.jnr/jnr-unixsocket)
0.18 it uses is not compatible with Apple Silicons.
If we run our docker IT tests on Apple Silicons, it will fail like
```java
[info] org.apache.spark.sql.jdbc.MariaDBKrbIntegrationSuite *** ABORTED ***
(2 seconds, 264 milliseconds)
[info] com.spotify.docker.client.exceptions.DockerException:
java.util.concurrent.ExecutionException:
com.spotify.docker.client.shaded.javax.ws.rs.ProcessingException:
java.lang.UnsatisfiedLinkError: could not load FFI provider
jnr.ffi.provider.jffi.Provider
...
[info] Cause: java.lang.IllegalStateException: Can't overwrite cause with
java.lang.UnsatisfiedLinkError:
java.lang.UnsatisfiedLinkError:
/Users/hzyaoqin/spark/target/tmp/jffi15403099445119552969.dylib:
dlopen(/Users/hzyaoqin/spark/target/tmp/jffi15403099445119552969.dylib,
0x0001): tried:
'/Users/hzyaoqin/spark/target/tmp/jffi15403099445119552969.dylib' (fat
file, but missing compatible architecture (have 'i386,x86_64', need 'arm64')),
'/System/Volumes/Preboot/Cryptexes/OS/Users/hzyaoqin/spark/target/tmp/jffi15403099445119552969.dylib'
(no such file),
'/Users/hzyaoqin/spark/target/tmp/jffi15403099445119552969.dylib' (fat file,
but missing compatible architecture (have 'i386,x86_64', need 'arm64'))
```
In this PR, we use its alternative to enable docker-related tests on Apple
Chips
```xml
<dependency>
<groupId>com.github.docker-java</groupId>
<artifactId>docker-java</artifactId>
<scope>test</scope>
</dependency>
```
### Why are the changes needed?
For developers who use Apple Silicons, w/ this patch, they can test
JDBC/Docker Integration test locally instead of suffering slowness from GitHub
actions.
### Does this PR introduce _any_ user-facing change?
No, dev only
### How was this patch tested?
Pass the CIs and do the manual test on Apple Silicon.
```
$ build/sbt -Pdocker-integration-tests 'docker-integration-tests/testOnly
org.apache.spark.sql.jdbc.*MariaDB*'
...
[info] All tests passed.
[success] Total time: 157 s (02:37), completed Sep 27, 2024, 2:45:16 PM
$ build/sbt -Pdocker-integration-tests 'docker-integration-tests/testOnly
org.apache.spark.sql.jdbc.*MySQL*'
...
[info] All tests passed.
[success] Total time: 109 s (01:49), completed Sep 27, 2024, 2:48:47 PM
```
### Was this patch authored or co-authored using generative AI tooling?
No.
Closes #48289 from dongjoon-hyun/SPARK-46525.
Authored-by: Kent Yao <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
---
connector/docker-integration-tests/pom.xml | 34 ++----
.../spark/sql/jdbc/DB2KrbIntegrationSuite.scala | 15 +--
.../sql/jdbc/DockerJDBCIntegrationSuite.scala | 131 ++++++++++++---------
.../sql/jdbc/MariaDBKrbIntegrationSuite.scala | 18 +--
.../sql/jdbc/PostgresKrbIntegrationSuite.scala | 16 +--
pom.xml | 25 ++--
project/SparkBuild.scala | 3 +-
7 files changed, 130 insertions(+), 112 deletions(-)
diff --git a/connector/docker-integration-tests/pom.xml
b/connector/docker-integration-tests/pom.xml
index d655d1a55281..19377b36a612 100644
--- a/connector/docker-integration-tests/pom.xml
+++ b/connector/docker-integration-tests/pom.xml
@@ -46,22 +46,6 @@
</repositories>
<dependencies>
- <dependency>
- <groupId>com.spotify</groupId>
- <artifactId>docker-client</artifactId>
- <scope>test</scope>
- <classifier>shaded</classifier>
- </dependency>
- <dependency>
- <groupId>org.apache.httpcomponents</groupId>
- <artifactId>httpclient</artifactId>
- <scope>test</scope>
- </dependency>
- <dependency>
- <groupId>org.apache.httpcomponents</groupId>
- <artifactId>httpcore</artifactId>
- <scope>test</scope>
- </dependency>
<!-- Necessary in order to avoid errors in log messages: -->
<dependency>
<groupId>com.google.guava</groupId>
@@ -112,14 +96,6 @@
<artifactId>hadoop-minikdc</artifactId>
<scope>test</scope>
</dependency>
- <!-- Although SPARK-28737 upgraded Jersey to 2.29 for JDK11,
'com.spotify.docker-client' still
- uses this repackaged 'jersey-guava'. We add this back for JDK8/JDK11
testing. -->
- <dependency>
- <groupId>org.glassfish.jersey.bundles.repackaged</groupId>
- <artifactId>jersey-guava</artifactId>
- <version>2.25.1</version>
- <scope>test</scope>
- </dependency>
<dependency>
<groupId>org.mariadb.jdbc</groupId>
<artifactId>mariadb-java-client</artifactId>
@@ -167,5 +143,15 @@
<artifactId>mysql-connector-j</artifactId>
<scope>test</scope>
</dependency>
+ <dependency>
+ <groupId>com.github.docker-java</groupId>
+ <artifactId>docker-java</artifactId>
+ <scope>test</scope>
+ </dependency>
+ <dependency>
+ <groupId>com.github.docker-java</groupId>
+ <artifactId>docker-java-transport-zerodep</artifactId>
+ <scope>test</scope>
+ </dependency>
</dependencies>
</project>
diff --git
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/DB2KrbIntegrationSuite.scala
b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/DB2KrbIntegrationSuite.scala
index 9b518d61d252..66e2afbb6eff 100644
---
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/DB2KrbIntegrationSuite.scala
+++
b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/DB2KrbIntegrationSuite.scala
@@ -21,7 +21,7 @@ import java.security.PrivilegedExceptionAction
import java.sql.Connection
import javax.security.auth.login.Configuration
-import com.spotify.docker.client.messages.{ContainerConfig, HostConfig}
+import com.github.dockerjava.api.model.{AccessMode, Bind, ContainerConfig,
HostConfig, Volume}
import org.apache.hadoop.security.{SecurityUtil, UserGroupInformation}
import
org.apache.hadoop.security.UserGroupInformation.AuthenticationMethod.KERBEROS
import org.scalatest.time.SpanSugar._
@@ -66,14 +66,15 @@ class DB2KrbIntegrationSuite extends
DockerKrbJDBCIntegrationSuite {
}
override def beforeContainerStart(
- hostConfigBuilder: HostConfig.Builder,
- containerConfigBuilder: ContainerConfig.Builder): Unit = {
+ hostConfigBuilder: HostConfig,
+ containerConfigBuilder: ContainerConfig): Unit = {
copyExecutableResource("db2_krb_setup.sh", initDbDir, replaceIp)
- hostConfigBuilder.appendBinds(
- HostConfig.Bind.from(initDbDir.getAbsolutePath)
- .to("/var/custom").readOnly(true).build()
- )
+ val newBind = new Bind(
+ initDbDir.getAbsolutePath,
+ new Volume("/var/custom"),
+ AccessMode.ro)
+ hostConfigBuilder.withBinds(hostConfigBuilder.getBinds :+ newBind: _*)
}
}
diff --git
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/DockerJDBCIntegrationSuite.scala
b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/DockerJDBCIntegrationSuite.scala
index 55142e6d8de8..837382239514 100644
---
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/DockerJDBCIntegrationSuite.scala
+++
b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/DockerJDBCIntegrationSuite.scala
@@ -20,14 +20,18 @@ package org.apache.spark.sql.jdbc
import java.net.ServerSocket
import java.sql.{Connection, DriverManager}
import java.util.Properties
+import java.util.concurrent.TimeUnit
import scala.collection.JavaConverters._
import scala.util.control.NonFatal
-import com.spotify.docker.client._
-import com.spotify.docker.client.DockerClient.{ListContainersParam, LogsParam}
-import com.spotify.docker.client.exceptions.ImageNotFoundException
-import com.spotify.docker.client.messages.{ContainerConfig, HostConfig,
PortBinding}
+import com.github.dockerjava.api.DockerClient
+import com.github.dockerjava.api.async.{ResultCallback, ResultCallbackTemplate}
+import com.github.dockerjava.api.command.CreateContainerResponse
+import com.github.dockerjava.api.exception.NotFoundException
+import com.github.dockerjava.api.model._
+import com.github.dockerjava.core.{DefaultDockerClientConfig, DockerClientImpl}
+import com.github.dockerjava.zerodep.ZerodepDockerHttpClient
import org.scalatest.concurrent.Eventually
import org.scalatest.time.SpanSugar._
@@ -88,8 +92,8 @@ abstract class DatabaseOnDocker {
* Optional step before container starts
*/
def beforeContainerStart(
- hostConfigBuilder: HostConfig.Builder,
- containerConfigBuilder: ContainerConfig.Builder): Unit = {}
+ hostConfigBuilder: HostConfig,
+ containerConfigBuilder: ContainerConfig): Unit = {}
}
abstract class DockerJDBCIntegrationSuite
@@ -111,56 +115,75 @@ abstract class DockerJDBCIntegrationSuite
sock.close()
port
}
- private var containerId: String = _
+ private var container: CreateContainerResponse = _
private var pulled: Boolean = false
protected var jdbcUrl: String = _
override def beforeAll(): Unit = runIfTestsEnabled(s"Prepare for
${this.getClass.getName}") {
super.beforeAll()
try {
- docker = DefaultDockerClient.fromEnv.build()
+ val config = DefaultDockerClientConfig.createDefaultConfigBuilder.build
+ val httpClient = new ZerodepDockerHttpClient.Builder()
+ .dockerHost(config.getDockerHost)
+ .sslConfig(config.getSSLConfig)
+ .build()
+ docker = DockerClientImpl.getInstance(config, httpClient)
// Check that Docker is actually up
try {
- docker.ping()
+ docker.pingCmd().exec()
} catch {
case NonFatal(e) =>
log.error("Exception while connecting to Docker. Check whether
Docker is running.")
throw e
}
- // Ensure that the Docker image is installed:
try {
- docker.inspectImage(db.imageName)
+ // Ensure that the Docker image is installed:
+ docker.inspectImageCmd(db.imageName).exec()
} catch {
- case e: ImageNotFoundException =>
+ case e: NotFoundException =>
log.warn(s"Docker image ${db.imageName} not found; pulling image
from registry")
- docker.pull(db.imageName)
+ docker.pullImageCmd(db.imageName)
+ .start()
+ .awaitCompletion(connectionTimeout.value.toSeconds,
TimeUnit.SECONDS)
pulled = true
}
- val hostConfigBuilder = HostConfig.builder()
- .privileged(db.privileged)
- .networkMode("bridge")
- .ipcMode(if (db.usesIpc) "host" else "")
- .portBindings(
- Map(s"${db.jdbcPort}/tcp" -> List(PortBinding.of(dockerIp,
externalPort)).asJava).asJava)
- // Create the database container:
- val containerConfigBuilder = ContainerConfig.builder()
- .image(db.imageName)
- .networkDisabled(false)
- .env(db.env.map { case (k, v) => s"$k=$v" }.toSeq.asJava)
- .exposedPorts(s"${db.jdbcPort}/tcp")
- if (db.getEntryPoint.isDefined) {
- containerConfigBuilder.entrypoint(db.getEntryPoint.get)
- }
- if (db.getStartupProcessName.isDefined) {
- containerConfigBuilder.cmd(db.getStartupProcessName.get)
+
+ docker.pullImageCmd(db.imageName)
+ .start()
+ .awaitCompletion(connectionTimeout.value.toSeconds, TimeUnit.SECONDS)
+
+ val hostConfig = HostConfig
+ .newHostConfig()
+ .withNetworkMode("bridge")
+ .withPrivileged(db.privileged)
+ .withPortBindings(PortBinding.parse(s"$externalPort:${db.jdbcPort}"))
+
+ if (db.usesIpc) {
+ hostConfig.withIpcMode("host")
}
- db.beforeContainerStart(hostConfigBuilder, containerConfigBuilder)
- containerConfigBuilder.hostConfig(hostConfigBuilder.build())
- val config = containerConfigBuilder.build()
+
+ val containerConfig = new ContainerConfig()
+
+ db.beforeContainerStart(hostConfig, containerConfig)
+
// Create the database container:
- containerId = docker.createContainer(config).id
+ val createContainerCmd = docker.createContainerCmd(db.imageName)
+ .withHostConfig(hostConfig)
+ .withExposedPorts(ExposedPort.tcp(db.jdbcPort))
+ .withEnv(db.env.map { case (k, v) => s"$k=$v" }.toList.asJava)
+ .withNetworkDisabled(false)
+
+
+ db.getEntryPoint.foreach(ep => createContainerCmd.withEntrypoint(ep))
+ db.getStartupProcessName.foreach(n => createContainerCmd.withCmd(n))
+
+ container = createContainerCmd.exec()
// Start the container and wait until the database can accept JDBC
connections:
- docker.startContainer(containerId)
+ docker.startContainerCmd(container.getId).exec()
+ eventually(connectionTimeout, interval(1.second)) {
+ val response = docker.inspectContainerCmd(container.getId).exec()
+ assert(response.getState.getRunning)
+ }
jdbcUrl = db.getJdbcUrl(dockerIp, externalPort)
var conn: Connection = null
eventually(connectionTimeout, interval(1.second)) {
@@ -174,6 +197,7 @@ abstract class DockerJDBCIntegrationSuite
}
} catch {
case NonFatal(e) =>
+ logError(s"Failed to initialize Docker container for
${this.getClass.getName}", e)
try {
afterAll()
} finally {
@@ -206,36 +230,35 @@ abstract class DockerJDBCIntegrationSuite
def dataPreparation(connection: Connection): Unit
private def cleanupContainer(): Unit = {
- if (docker != null && containerId != null && !keepContainer) {
+ if (docker != null && container != null && !keepContainer) {
try {
- docker.killContainer(containerId)
+ docker.killContainerCmd(container.getId).exec()
} catch {
case NonFatal(e) =>
- val exitContainerIds =
-
docker.listContainers(ListContainersParam.withStatusExited()).asScala.map(_.id())
- if (exitContainerIds.contains(containerId)) {
- logWarning(s"Container $containerId already stopped")
- } else {
- logWarning(s"Could not stop container $containerId", e)
- }
+ val response = docker.inspectContainerCmd(container.getId).exec()
+ logWarning(s"Container $container already stopped")
+ val status =
Option(response).map(_.getState.getStatus).getOrElse("unknown")
+ logWarning(s"Could not stop container $container at stage
'$status'", e)
} finally {
logContainerOutput()
- docker.removeContainer(containerId)
+ docker.removeContainerCmd(container.getId).exec()
if (removePulledImage && pulled) {
- docker.removeImage(db.imageName)
+ docker.removeImageCmd(db.imageName).exec()
}
}
}
}
private def logContainerOutput(): Unit = {
- val logStream = docker.logs(containerId, LogsParam.stdout(),
LogsParam.stderr())
- try {
- logInfo("\n\n===== CONTAINER LOGS FOR container Id: " + containerId + "
=====")
- logInfo(logStream.readFully())
- logInfo("\n\n===== END OF CONTAINER LOGS FOR container Id: " +
containerId + " =====")
- } finally {
- logStream.close()
- }
+ logInfo("\n\n===== CONTAINER LOGS FOR container Id: " + container + "
=====")
+ docker.logContainerCmd(container.getId)
+ .withStdOut(true)
+ .withStdErr(true)
+ .withFollowStream(true)
+ .withSince(0).exec(
+ new ResultCallbackTemplate[ResultCallback[Frame], Frame] {
+ override def onNext(f: Frame): Unit = logInfo(f.toString)
+ })
+ logInfo("\n\n===== END OF CONTAINER LOGS FOR container Id: " + container +
" =====")
}
}
diff --git
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MariaDBKrbIntegrationSuite.scala
b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MariaDBKrbIntegrationSuite.scala
index 873d5ad1ee43..49c9e3dba0d7 100644
---
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MariaDBKrbIntegrationSuite.scala
+++
b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MariaDBKrbIntegrationSuite.scala
@@ -19,7 +19,7 @@ package org.apache.spark.sql.jdbc
import javax.security.auth.login.Configuration
-import com.spotify.docker.client.messages.{ContainerConfig, HostConfig}
+import com.github.dockerjava.api.model.{AccessMode, Bind, ContainerConfig,
HostConfig, Volume}
import
org.apache.spark.sql.execution.datasources.jdbc.connection.SecureConnectionProvider
import org.apache.spark.tags.DockerTest
@@ -52,17 +52,17 @@ class MariaDBKrbIntegrationSuite extends
DockerKrbJDBCIntegrationSuite {
Some("/docker-entrypoint/mariadb_docker_entrypoint.sh")
override def beforeContainerStart(
- hostConfigBuilder: HostConfig.Builder,
- containerConfigBuilder: ContainerConfig.Builder): Unit = {
+ hostConfigBuilder: HostConfig,
+ containerConfigBuilder: ContainerConfig): Unit = {
copyExecutableResource("mariadb_docker_entrypoint.sh", entryPointDir,
replaceIp)
copyExecutableResource("mariadb_krb_setup.sh", initDbDir, replaceIp)
- hostConfigBuilder.appendBinds(
- HostConfig.Bind.from(entryPointDir.getAbsolutePath)
- .to("/docker-entrypoint").readOnly(true).build(),
- HostConfig.Bind.from(initDbDir.getAbsolutePath)
- .to("/docker-entrypoint-initdb.d").readOnly(true).build()
- )
+ val binds =
+ Seq(entryPointDir -> "/docker-entrypoint", initDbDir ->
"/docker-entrypoint-initdb.d")
+ .map { case (from, to) =>
+ new Bind(from.getAbsolutePath, new Volume(to), AccessMode.ro)
+ }
+ hostConfigBuilder.withBinds(hostConfigBuilder.getBinds ++ binds: _*)
}
}
diff --git
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/PostgresKrbIntegrationSuite.scala
b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/PostgresKrbIntegrationSuite.scala
index 667d8c778618..1dcf101b394a 100644
---
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/PostgresKrbIntegrationSuite.scala
+++
b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/PostgresKrbIntegrationSuite.scala
@@ -19,7 +19,7 @@ package org.apache.spark.sql.jdbc
import javax.security.auth.login.Configuration
-import com.spotify.docker.client.messages.{ContainerConfig, HostConfig}
+import com.github.dockerjava.api.model.{AccessMode, Bind, ContainerConfig,
HostConfig, Volume}
import
org.apache.spark.sql.execution.datasources.jdbc.connection.SecureConnectionProvider
import org.apache.spark.tags.DockerTest
@@ -48,14 +48,14 @@ class PostgresKrbIntegrationSuite extends
DockerKrbJDBCIntegrationSuite {
s"jdbc:postgresql://$ip:$port/postgres?user=$principal&gsslib=gssapi"
override def beforeContainerStart(
- hostConfigBuilder: HostConfig.Builder,
- containerConfigBuilder: ContainerConfig.Builder): Unit = {
+ hostConfigBuilder: HostConfig,
+ containerConfigBuilder: ContainerConfig): Unit = {
copyExecutableResource("postgres_krb_setup.sh", initDbDir, replaceIp)
-
- hostConfigBuilder.appendBinds(
- HostConfig.Bind.from(initDbDir.getAbsolutePath)
- .to("/docker-entrypoint-initdb.d").readOnly(true).build()
- )
+ val newBind = new Bind(
+ initDbDir.getAbsolutePath,
+ new Volume("/docker-entrypoint-initdb.d"),
+ AccessMode.ro)
+ hostConfigBuilder.withBinds(hostConfigBuilder.getBinds :+ newBind: _*)
}
}
diff --git a/pom.xml b/pom.xml
index 04acbdb3cd6e..3d9b003bd19c 100644
--- a/pom.xml
+++ b/pom.xml
@@ -1219,22 +1219,31 @@
<scope>test</scope>
</dependency>
<dependency>
- <groupId>com.spotify</groupId>
- <artifactId>docker-client</artifactId>
- <version>8.14.1</version>
+ <groupId>com.github.docker-java</groupId>
+ <artifactId>docker-java</artifactId>
+ <version>3.3.4</version>
<scope>test</scope>
- <classifier>shaded</classifier>
<exclusions>
- <exclusion>
- <artifactId>guava</artifactId>
- <groupId>com.google.guava</groupId>
- </exclusion>
<exclusion>
<groupId>commons-logging</groupId>
<artifactId>commons-logging</artifactId>
</exclusion>
+ <exclusion>
+ <groupId>com.github.docker-java</groupId>
+ <artifactId>docker-java-transport-netty</artifactId>
+ </exclusion>
+ <exclusion>
+ <groupId>com.github.docker-java</groupId>
+ <artifactId>docker-java-transport-jersey</artifactId>
+ </exclusion>
</exclusions>
</dependency>
+ <dependency>
+ <groupId>com.github.docker-java</groupId>
+ <artifactId>docker-java-transport-zerodep</artifactId>
+ <version>3.3.4</version>
+ <scope>test</scope>
+ </dependency>
<dependency>
<groupId>com.mysql</groupId>
<artifactId>mysql-connector-j</artifactId>
diff --git a/project/SparkBuild.scala b/project/SparkBuild.scala
index e8c52dc0aff3..f8659a4f4a25 100644
--- a/project/SparkBuild.scala
+++ b/project/SparkBuild.scala
@@ -469,8 +469,7 @@ object SparkBuild extends PomBuild {
/* Protobuf settings */
enable(SparkProtobuf.settings)(protobuf)
- // SPARK-14738 - Remove docker tests from main Spark build
- // enable(DockerIntegrationTests.settings)(dockerIntegrationTests)
+ enable(DockerIntegrationTests.settings)(dockerIntegrationTests)
if (!profiles.contains("volcano")) {
enable(Volcano.settings)(kubernetes)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]