This is an automated email from the ASF dual-hosted git repository.
hvanhovell pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/branch-3.5 by this push:
new c5203abcbd1 [SPARK-45371][CONNECT] Fix shading issues in the Spark
Connect Scala Client
c5203abcbd1 is described below
commit c5203abcbd191423071ef3603e95a7941bb1eec2
Author: Herman van Hovell <[email protected]>
AuthorDate: Mon Oct 2 13:03:06 2023 -0400
[SPARK-45371][CONNECT] Fix shading issues in the Spark Connect Scala Client
### What changes were proposed in this pull request?
This PR fixes shading for the Spark Connect Scala Client maven build. The
following things are addressed:
- Guava and protobuf are included in the shaded jars. These were missing,
and were causing users to see `ClassNotFoundException`s.
- Fixed duplicate shading of guava. We use the parent pom's location now.
- Fixed duplicate Netty dependency (shaded and transitive). One was used
for GRPC and the other was needed by Arrow. This was fixed by pulling arrow
into the shaded jar.
- Use the same package as the shading defined in the parent package.
### Why are the changes needed?
The maven artifacts for the Spark Connect Scala Client are currently broken.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Manual tests.
#### Step 1: Build new shaded library and install it in local maven
repository
`build/mvn clean install -pl connector/connect/client/jvm -am -DskipTests`
#### Step 2: Start Connect Server
`connector/connect/bin/spark-connect`
#### Step 3: Launch REPL using the newly created library
This step requires [coursier](https://get-coursier.io/) to be installed.
`cs launch --jvm zulu:17.0.8 --scala 2.13.9 -r m2Local
com.lihaoyi:::ammonite:2.5.11
org.apache.spark::spark-connect-client-jvm:4.0.0-SNAPSHOT --java-opt
--add-opens=java.base/java.nio=ALL-UNNAMED -M
org.apache.spark.sql.application.ConnectRepl`
#### Step 4: Run a bunch of commands:
```scala
// Check version
spark.version
// Run a simple query
{
spark.range(1, 10000, 1)
.select($"id", $"id" % 5 as "group", rand(1).as("v1"), rand(2).as("v2"))
.groupBy($"group")
.agg(
avg($"v1").as("v1_avg"),
avg($"v2").as("v2_avg"))
.show()
}
// Run a streaming query
{
import org.apache.spark.sql.execution.streaming.ProcessingTimeTrigger
val query_name = "simple_streaming"
val stream = spark.readStream
.format("rate")
.option("numPartitions", "1")
.option("rowsPerSecond", "10")
.load()
.withWatermark("timestamp", "10 milliseconds")
.groupBy(window(col("timestamp"), "10 milliseconds"))
.count()
.selectExpr("window.start as timestamp", "count as num_events")
.writeStream
.format("memory")
.queryName(query_name)
.trigger(ProcessingTimeTrigger.create("10 milliseconds"))
// run for 20 seconds
val query = stream.start()
val start = System.currentTimeMillis()
val end = System.currentTimeMillis() + 20 * 1000
while (System.currentTimeMillis() < end) {
println(s"time: ${System.currentTimeMillis() - start} ms")
println(query.status)
spark.sql(s"select * from ${query_name}").show()
Thread.sleep(500)
}
query.stop()
}
```
Closes #43195 from hvanhovell/SPARK-45371.
Authored-by: Herman van Hovell <[email protected]>
Signed-off-by: Herman van Hovell <[email protected]>
(cherry picked from commit e53abbbceaa2c41babaa23fe4c2f282f559b4c03)
Signed-off-by: Herman van Hovell <[email protected]>
---
connector/connect/client/jvm/pom.xml | 39 +++++++++++++++++++++++++++---------
1 file changed, 30 insertions(+), 9 deletions(-)
diff --git a/connector/connect/client/jvm/pom.xml
b/connector/connect/client/jvm/pom.xml
index 67227ef38eb..236e5850b76 100644
--- a/connector/connect/client/jvm/pom.xml
+++ b/connector/connect/client/jvm/pom.xml
@@ -50,10 +50,20 @@
<artifactId>spark-sketch_${scala.binary.version}</artifactId>
<version>${project.version}</version>
</dependency>
+ <!--
+ We need to define guava and protobuf here because we need to change the
scope of both from
+ provided to compile. If we don't do this we can't shade these libraries.
+ -->
<dependency>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
<version>${connect.guava.version}</version>
+ <scope>compile</scope>
+ </dependency>
+ <dependency>
+ <groupId>com.google.protobuf</groupId>
+ <artifactId>protobuf-java</artifactId>
+ <scope>compile</scope>
</dependency>
<dependency>
<groupId>com.lihaoyi</groupId>
@@ -85,6 +95,7 @@
<artifactId>maven-shade-plugin</artifactId>
<configuration>
<shadedArtifactAttached>false</shadedArtifactAttached>
+ <promoteTransitiveDependencies>true</promoteTransitiveDependencies>
<artifactSet>
<includes>
<include>com.google.android:*</include>
@@ -92,52 +103,62 @@
<include>com.google.code.findbugs:*</include>
<include>com.google.code.gson:*</include>
<include>com.google.errorprone:*</include>
- <include>com.google.guava:*</include>
<include>com.google.j2objc:*</include>
<include>com.google.protobuf:*</include>
+ <include>com.google.flatbuffers:*</include>
<include>io.grpc:*</include>
<include>io.netty:*</include>
<include>io.perfmark:*</include>
+ <include>org.apache.arrow:*</include>
<include>org.codehaus.mojo:*</include>
<include>org.checkerframework:*</include>
<include>org.apache.spark:spark-connect-common_${scala.binary.version}</include>
+
<include>org.apache.spark:spark-sql-api_${scala.binary.version}</include>
</includes>
</artifactSet>
<relocations>
<relocation>
<pattern>io.grpc</pattern>
-
<shadedPattern>${spark.shade.packageName}.connect.client.io.grpc</shadedPattern>
+ <shadedPattern>${spark.shade.packageName}.io.grpc</shadedPattern>
<includes>
<include>io.grpc.**</include>
</includes>
</relocation>
<relocation>
<pattern>com.google</pattern>
-
<shadedPattern>${spark.shade.packageName}.connect.client.com.google</shadedPattern>
+
<shadedPattern>${spark.shade.packageName}.com.google</shadedPattern>
+ <excludes>
+ <!-- Guava is relocated to ${spark.shade.packageName}.guava
(see the parent pom.xml) -->
+ <exclude>com.google.common.**</exclude>
+ </excludes>
</relocation>
<relocation>
<pattern>io.netty</pattern>
-
<shadedPattern>${spark.shade.packageName}.connect.client.io.netty</shadedPattern>
+
<shadedPattern>${spark.shade.packageName}.io.netty</shadedPattern>
</relocation>
<relocation>
<pattern>org.checkerframework</pattern>
-
<shadedPattern>${spark.shade.packageName}.connect.client.org.checkerframework</shadedPattern>
+
<shadedPattern>${spark.shade.packageName}.org.checkerframework</shadedPattern>
</relocation>
<relocation>
<pattern>javax.annotation</pattern>
-
<shadedPattern>${spark.shade.packageName}.connect.client.javax.annotation</shadedPattern>
+
<shadedPattern>${spark.shade.packageName}.javax.annotation</shadedPattern>
</relocation>
<relocation>
<pattern>io.perfmark</pattern>
-
<shadedPattern>${spark.shade.packageName}.connect.client.io.perfmark</shadedPattern>
+
<shadedPattern>${spark.shade.packageName}.io.perfmark</shadedPattern>
</relocation>
<relocation>
<pattern>org.codehaus</pattern>
-
<shadedPattern>${spark.shade.packageName}.connect.client.org.codehaus</shadedPattern>
+
<shadedPattern>${spark.shade.packageName}.org.codehaus</shadedPattern>
+ </relocation>
+ <relocation>
+ <pattern>org.apache.arrow</pattern>
+
<shadedPattern>${spark.shade.packageName}.org.apache.arrow</shadedPattern>
</relocation>
<relocation>
<pattern>android.annotation</pattern>
-
<shadedPattern>${spark.shade.packageName}.connect.client.android.annotation</shadedPattern>
+
<shadedPattern>${spark.shade.packageName}.android.annotation</shadedPattern>
</relocation>
</relocations>
<!--SPARK-42228: Add `ServicesResourceTransformer` to relocation
class names in META-INF/services for grpc-->
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]