This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-4.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-4.0 by this push:
     new 2da838bbd601 [SPARK-52612][INFRA] Add an env `NO_PROVIDED_SPARK_JARS` 
to control collection behavior of `sbt/package` for `spark-avro.jar` and 
`spark-protobuf.jar`
2da838bbd601 is described below

commit 2da838bbd60119a01fba89dc82d6be70d99a8d38
Author: yangjie01 <[email protected]>
AuthorDate: Tue Jul 1 20:58:11 2025 -0700

    [SPARK-52612][INFRA] Add an env `NO_PROVIDED_SPARK_JARS` to control 
collection behavior of `sbt/package` for `spark-avro.jar` and 
`spark-protobuf.jar`
    
    ### What changes were proposed in this pull request?
    This pr introduces an environment variable named `NO_PROVIDED_SPARK_JARS`, 
which controls the behavior of the `sbt/package` command so that it only 
collects `spark-avro.jar` and `spark-protobuf.jar` into the 
`assembly/target/scala-2.13/jars` directory during documentation generation.
    
    ### Why are the changes needed?
    1. To ensure that, by default, the `sbt/package` command does not collect 
jars with a `provided` scope, such as `spark-avro.jar` and 
`spark-protobuf.jar`, into the `assembly/target/scala-2.13/jars` directory, 
maintaining consistency with Maven's behavior.
    
    2. To ensure that, during documentation generation, the `sbt/package` 
command collects the necessary jars into the `assembly/target/scala-2.13/jars` 
directory to ensure that no dependencies are missing for the documentation 
generation task.
    
    3. To avoid the following error when executing benchmark tasks using GitHub 
Actions:
    
    ```
    25/06/28 07:03:45 ERROR SparkContext: Failed to add 
file:///home/runner/work/spark/spark/assembly/target/scala-2.13/jars/spark-avro_2.13-4.1.0-SNAPSHOT.jar
 to Spark environment
    java.lang.IllegalArgumentException: requirement failed: File 
spark-avro_2.13-4.1.0-SNAPSHOT.jar was already registered with a different path 
(old path = 
/home/runner/work/spark/spark/connector/avro/target/scala-2.13/spark-avro_2.13-4.1.0-SNAPSHOT.jar,
 new path = 
/home/runner/work/spark/spark/assembly/target/scala-2.13/jars/spark-avro_2.13-4.1.0-SNAPSHOT.jar
    ...
    ```
    
    ### Does this PR introduce _any_ user-facing change?
    No
    
    ### How was this patch tested?
    - Passed GitHub Actions.
    - Manually confirmed that benchmark tasks are not affected and that the 
ERROR log described above no longer appears during benchmark task execution.
    
    ### Was this patch authored or co-authored using generative AI tooling?
    No
    
    Closes #51321 from LuciferYang/SPARK-52612.
    
    Authored-by: yangjie01 <[email protected]>
    Signed-off-by: Dongjoon Hyun <[email protected]>
    (cherry picked from commit 591e1c38dedd96cde8d56804c3c642da2adc9f71)
    Signed-off-by: Dongjoon Hyun <[email protected]>
---
 docs/_plugins/build_api_docs.rb | 2 +-
 project/SparkBuild.scala        | 9 ++++++++-
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/docs/_plugins/build_api_docs.rb b/docs/_plugins/build_api_docs.rb
index 0aa0db0bf989..590b1c3bd93d 100644
--- a/docs/_plugins/build_api_docs.rb
+++ b/docs/_plugins/build_api_docs.rb
@@ -45,7 +45,7 @@ def build_spark_if_necessary
 
   print_header "Building Spark."
   cd(SPARK_PROJECT_ROOT)
-  command = "build/sbt -Phive -Pkinesis-asl clean package"
+  command = "NO_PROVIDED_SPARK_JARS=0 build/sbt -Phive -Pkinesis-asl clean 
package"
   puts "Running '#{command}'; this may take a few minutes..."
   system(command) || raise("Failed to build Spark")
   $spark_package_is_built = true
diff --git a/project/SparkBuild.scala b/project/SparkBuild.scala
index f57438228745..cded163e81f3 100644
--- a/project/SparkBuild.scala
+++ b/project/SparkBuild.scala
@@ -1517,6 +1517,9 @@ object CopyDependencies {
       val fid = (LocalProject("connect") / assembly).value
       val fidClient = (LocalProject("connect-client-jvm") / assembly).value
       val fidProtobuf = (LocalProject("protobuf") / assembly).value
+      val noProvidedSparkJars: Boolean = 
sys.env.getOrElse("NO_PROVIDED_SPARK_JARS", "1") == "1" ||
+        sys.env.getOrElse("NO_PROVIDED_SPARK_JARS", "true")
+          .toLowerCase(Locale.getDefault()) == "true"
 
       (Compile / dependencyClasspath).value.map(_.data)
         .filter { jar => jar.isFile() }
@@ -1532,12 +1535,16 @@ object CopyDependencies {
             // Don't copy the spark connect common JAR as it is shaded in the 
spark connect.
           } else if (jar.getName.contains("connect-client-jvm")) {
             // Do not place Spark Connect client jars as it is not built-in.
+          } else if (noProvidedSparkJars && 
jar.getName.contains("spark-avro")) {
+            // Do not place Spark Avro jars as it is not built-in.
           } else if (jar.getName.contains("spark-connect") &&
             !SbtPomKeys.profiles.value.contains("noshade-connect")) {
             Files.copy(fid.toPath, destJar.toPath)
           } else if (jar.getName.contains("spark-protobuf") &&
             !SbtPomKeys.profiles.value.contains("noshade-protobuf")) {
-            Files.copy(fidProtobuf.toPath, destJar.toPath)
+            if (!noProvidedSparkJars) {
+              Files.copy(fidProtobuf.toPath, destJar.toPath)
+            }
           } else {
             Files.copy(jar.toPath(), destJar.toPath())
           }


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to