This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
     new 370453adba17 [SPARK-49678][CORE] Support `spark.test.master` in 
`SparkSubmitArguments`
370453adba17 is described below

commit 370453adba1730b5412750b34e87a35147d71aa2
Author: Dongjoon Hyun <[email protected]>
AuthorDate: Mon Sep 16 20:53:35 2024 -0700

    [SPARK-49678][CORE] Support `spark.test.master` in `SparkSubmitArguments`
    
    ### What changes were proposed in this pull request?
    
    This PR aims to support `spark.test.master` in `SparkSubmitArguments`.
    
    ### Why are the changes needed?
    
    To allow users to control the default master setting during testing and 
documentation generation.
    
    #### First, currently, we cannot build `Python Documentation` on M3 Max 
(and high-core machines) without this. Only it succeeds on GitHub Action 
runners (4 cores) or equivalent low-core docker run. Please try the following 
on your Macs.
    
    **BEFORE**
    ```
    $ build/sbt package -Phive-thriftserver
    $ cd python/docs
    $ make html
    ...
    java.lang.OutOfMemoryError: Java heap space
    ...
    24/09/16 14:09:55 WARN PythonRunner: Incomplete task 7.0 in stage 30 (TID 
177) interrupted: Attempting to kill Python Worker
    ...
    make: *** [html] Error 2
    ```
    
    **AFTER**
    ```
    $ build/sbt package -Phive-thriftserver
    $ cd python/docs
    $ JDK_JAVA_OPTIONS="-Dspark.test.master=local[1]" make html
    ...
    build succeeded.
    
    The HTML pages are in build/html.
    ```
    
    #### Second, in general, we can control all `SparkSubmit` (eg. Spark 
Shells) like the following.
    
    **BEFORE (`local[*]`)**
    ```
    $ bin/pyspark
    Python 3.9.19 (main, Jun 17 2024, 15:39:29)
    [Clang 15.0.0 (clang-1500.3.9.4)] on darwin
    Type "help", "copyright", "credits" or "license" for more information.
    WARNING: Using incubator modules: jdk.incubator.vector
    Using Spark's default log4j profile: 
org/apache/spark/log4j2-pattern-layout-defaults.properties
    Setting default log level to "WARN".
    To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use 
setLogLevel(newLevel).
    24/09/16 13:53:02 WARN NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
    Welcome to
          ____              __
         / __/__  ___ _____/ /__
        _\ \/ _ \/ _ `/ __/  '_/
       /__ / .__/\_,_/_/ /_/\_\   version 4.0.0-SNAPSHOT
          /_/
    
    Using Python version 3.9.19 (main, Jun 17 2024 15:39:29)
    Spark context Web UI available at http://localhost:4040
    Spark context available as 'sc' (master = local[*], app id = 
local-1726519982935).
    SparkSession available as 'spark'.
    >>>
    ```
    
    **AFTER (`local[1]`)**
    ```
    $ JDK_JAVA_OPTIONS="-Dspark.test.master=local[1]" bin/pyspark
    NOTE: Picked up JDK_JAVA_OPTIONS: -Dspark.test.master=local[1]
    Python 3.9.19 (main, Jun 17 2024, 15:39:29)
    [Clang 15.0.0 (clang-1500.3.9.4)] on darwin
    Type "help", "copyright", "credits" or "license" for more information.
    NOTE: Picked up JDK_JAVA_OPTIONS: -Dspark.test.master=local[1]
    NOTE: Picked up JDK_JAVA_OPTIONS: -Dspark.test.master=local[1]
    WARNING: Using incubator modules: jdk.incubator.vector
    Using Spark's default log4j profile: 
org/apache/spark/log4j2-pattern-layout-defaults.properties
    Setting default log level to "WARN".
    To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use 
setLogLevel(newLevel).
    24/09/16 13:51:03 WARN NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
    Welcome to
          ____              __
         / __/__  ___ _____/ /__
        _\ \/ _ \/ _ `/ __/  '_/
       /__ / .__/\_,_/_/ /_/\_\   version 4.0.0-SNAPSHOT
          /_/
    
    Using Python version 3.9.19 (main, Jun 17 2024 15:39:29)
    Spark context Web UI available at http://localhost:4040
    Spark context available as 'sc' (master = local[1], app id = 
local-1726519863363).
    SparkSession available as 'spark'.
    >>>
    ```
    
    ### Does this PR introduce _any_ user-facing change?
    
    No. `spark.test.master` is a new parameter.
    
    ### How was this patch tested?
    
    Manual tests.
    
    ### Was this patch authored or co-authored using generative AI tooling?
    
    No.
    
    Closes #48126 from dongjoon-hyun/SPARK-49678.
    
    Authored-by: Dongjoon Hyun <[email protected]>
    Signed-off-by: Dongjoon Hyun <[email protected]>
---
 core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git 
a/core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala 
b/core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala
index 32dd2f81bbc8..2c9ddff34805 100644
--- a/core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala
+++ b/core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala
@@ -43,7 +43,8 @@ private[deploy] class SparkSubmitArguments(args: Seq[String], 
env: Map[String, S
   extends SparkSubmitArgumentsParser with Logging {
   var maybeMaster: Option[String] = None
   // Global defaults. These should be keep to minimum to avoid confusing 
behavior.
-  def master: String = maybeMaster.getOrElse("local[*]")
+  def master: String =
+    maybeMaster.getOrElse(System.getProperty("spark.test.master", "local[*]"))
   var maybeRemote: Option[String] = None
   var deployMode: String = null
   var executorMemory: String = null


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to