This is an automated email from the ASF dual-hosted git repository.
dongjoon pushed a commit to branch branch-4.0
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/branch-4.0 by this push:
new 2da838bbd601 [SPARK-52612][INFRA] Add an env `NO_PROVIDED_SPARK_JARS`
to control collection behavior of `sbt/package` for `spark-avro.jar` and
`spark-protobuf.jar`
2da838bbd601 is described below
commit 2da838bbd60119a01fba89dc82d6be70d99a8d38
Author: yangjie01 <[email protected]>
AuthorDate: Tue Jul 1 20:58:11 2025 -0700
[SPARK-52612][INFRA] Add an env `NO_PROVIDED_SPARK_JARS` to control
collection behavior of `sbt/package` for `spark-avro.jar` and
`spark-protobuf.jar`
### What changes were proposed in this pull request?
This pr introduces an environment variable named `NO_PROVIDED_SPARK_JARS`,
which controls the behavior of the `sbt/package` command so that it only
collects `spark-avro.jar` and `spark-protobuf.jar` into the
`assembly/target/scala-2.13/jars` directory during documentation generation.
### Why are the changes needed?
1. To ensure that, by default, the `sbt/package` command does not collect
jars with a `provided` scope, such as `spark-avro.jar` and
`spark-protobuf.jar`, into the `assembly/target/scala-2.13/jars` directory,
maintaining consistency with Maven's behavior.
2. To ensure that, during documentation generation, the `sbt/package`
command collects the necessary jars into the `assembly/target/scala-2.13/jars`
directory to ensure that no dependencies are missing for the documentation
generation task.
3. To avoid the following error when executing benchmark tasks using GitHub
Actions:
```
25/06/28 07:03:45 ERROR SparkContext: Failed to add
file:///home/runner/work/spark/spark/assembly/target/scala-2.13/jars/spark-avro_2.13-4.1.0-SNAPSHOT.jar
to Spark environment
java.lang.IllegalArgumentException: requirement failed: File
spark-avro_2.13-4.1.0-SNAPSHOT.jar was already registered with a different path
(old path =
/home/runner/work/spark/spark/connector/avro/target/scala-2.13/spark-avro_2.13-4.1.0-SNAPSHOT.jar,
new path =
/home/runner/work/spark/spark/assembly/target/scala-2.13/jars/spark-avro_2.13-4.1.0-SNAPSHOT.jar
...
```
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
- Passed GitHub Actions.
- Manually confirmed that benchmark tasks are not affected and that the
ERROR log described above no longer appears during benchmark task execution.
### Was this patch authored or co-authored using generative AI tooling?
No
Closes #51321 from LuciferYang/SPARK-52612.
Authored-by: yangjie01 <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
(cherry picked from commit 591e1c38dedd96cde8d56804c3c642da2adc9f71)
Signed-off-by: Dongjoon Hyun <[email protected]>
---
docs/_plugins/build_api_docs.rb | 2 +-
project/SparkBuild.scala | 9 ++++++++-
2 files changed, 9 insertions(+), 2 deletions(-)
diff --git a/docs/_plugins/build_api_docs.rb b/docs/_plugins/build_api_docs.rb
index 0aa0db0bf989..590b1c3bd93d 100644
--- a/docs/_plugins/build_api_docs.rb
+++ b/docs/_plugins/build_api_docs.rb
@@ -45,7 +45,7 @@ def build_spark_if_necessary
print_header "Building Spark."
cd(SPARK_PROJECT_ROOT)
- command = "build/sbt -Phive -Pkinesis-asl clean package"
+ command = "NO_PROVIDED_SPARK_JARS=0 build/sbt -Phive -Pkinesis-asl clean
package"
puts "Running '#{command}'; this may take a few minutes..."
system(command) || raise("Failed to build Spark")
$spark_package_is_built = true
diff --git a/project/SparkBuild.scala b/project/SparkBuild.scala
index f57438228745..cded163e81f3 100644
--- a/project/SparkBuild.scala
+++ b/project/SparkBuild.scala
@@ -1517,6 +1517,9 @@ object CopyDependencies {
val fid = (LocalProject("connect") / assembly).value
val fidClient = (LocalProject("connect-client-jvm") / assembly).value
val fidProtobuf = (LocalProject("protobuf") / assembly).value
+ val noProvidedSparkJars: Boolean =
sys.env.getOrElse("NO_PROVIDED_SPARK_JARS", "1") == "1" ||
+ sys.env.getOrElse("NO_PROVIDED_SPARK_JARS", "true")
+ .toLowerCase(Locale.getDefault()) == "true"
(Compile / dependencyClasspath).value.map(_.data)
.filter { jar => jar.isFile() }
@@ -1532,12 +1535,16 @@ object CopyDependencies {
// Don't copy the spark connect common JAR as it is shaded in the
spark connect.
} else if (jar.getName.contains("connect-client-jvm")) {
// Do not place Spark Connect client jars as it is not built-in.
+ } else if (noProvidedSparkJars &&
jar.getName.contains("spark-avro")) {
+ // Do not place Spark Avro jars as it is not built-in.
} else if (jar.getName.contains("spark-connect") &&
!SbtPomKeys.profiles.value.contains("noshade-connect")) {
Files.copy(fid.toPath, destJar.toPath)
} else if (jar.getName.contains("spark-protobuf") &&
!SbtPomKeys.profiles.value.contains("noshade-protobuf")) {
- Files.copy(fidProtobuf.toPath, destJar.toPath)
+ if (!noProvidedSparkJars) {
+ Files.copy(fidProtobuf.toPath, destJar.toPath)
+ }
} else {
Files.copy(jar.toPath(), destJar.toPath())
}
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]