This is an automated email from the ASF dual-hosted git repository.
dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push:
new 370453adba17 [SPARK-49678][CORE] Support `spark.test.master` in
`SparkSubmitArguments`
370453adba17 is described below
commit 370453adba1730b5412750b34e87a35147d71aa2
Author: Dongjoon Hyun <[email protected]>
AuthorDate: Mon Sep 16 20:53:35 2024 -0700
[SPARK-49678][CORE] Support `spark.test.master` in `SparkSubmitArguments`
### What changes were proposed in this pull request?
This PR aims to support `spark.test.master` in `SparkSubmitArguments`.
### Why are the changes needed?
To allow users to control the default master setting during testing and
documentation generation.
#### First, currently, we cannot build `Python Documentation` on M3 Max
(and high-core machines) without this. Only it succeeds on GitHub Action
runners (4 cores) or equivalent low-core docker run. Please try the following
on your Macs.
**BEFORE**
```
$ build/sbt package -Phive-thriftserver
$ cd python/docs
$ make html
...
java.lang.OutOfMemoryError: Java heap space
...
24/09/16 14:09:55 WARN PythonRunner: Incomplete task 7.0 in stage 30 (TID
177) interrupted: Attempting to kill Python Worker
...
make: *** [html] Error 2
```
**AFTER**
```
$ build/sbt package -Phive-thriftserver
$ cd python/docs
$ JDK_JAVA_OPTIONS="-Dspark.test.master=local[1]" make html
...
build succeeded.
The HTML pages are in build/html.
```
#### Second, in general, we can control all `SparkSubmit` (eg. Spark
Shells) like the following.
**BEFORE (`local[*]`)**
```
$ bin/pyspark
Python 3.9.19 (main, Jun 17 2024, 15:39:29)
[Clang 15.0.0 (clang-1500.3.9.4)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
WARNING: Using incubator modules: jdk.incubator.vector
Using Spark's default log4j profile:
org/apache/spark/log4j2-pattern-layout-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use
setLogLevel(newLevel).
24/09/16 13:53:02 WARN NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__ / .__/\_,_/_/ /_/\_\ version 4.0.0-SNAPSHOT
/_/
Using Python version 3.9.19 (main, Jun 17 2024 15:39:29)
Spark context Web UI available at http://localhost:4040
Spark context available as 'sc' (master = local[*], app id =
local-1726519982935).
SparkSession available as 'spark'.
>>>
```
**AFTER (`local[1]`)**
```
$ JDK_JAVA_OPTIONS="-Dspark.test.master=local[1]" bin/pyspark
NOTE: Picked up JDK_JAVA_OPTIONS: -Dspark.test.master=local[1]
Python 3.9.19 (main, Jun 17 2024, 15:39:29)
[Clang 15.0.0 (clang-1500.3.9.4)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
NOTE: Picked up JDK_JAVA_OPTIONS: -Dspark.test.master=local[1]
NOTE: Picked up JDK_JAVA_OPTIONS: -Dspark.test.master=local[1]
WARNING: Using incubator modules: jdk.incubator.vector
Using Spark's default log4j profile:
org/apache/spark/log4j2-pattern-layout-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use
setLogLevel(newLevel).
24/09/16 13:51:03 WARN NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__ / .__/\_,_/_/ /_/\_\ version 4.0.0-SNAPSHOT
/_/
Using Python version 3.9.19 (main, Jun 17 2024 15:39:29)
Spark context Web UI available at http://localhost:4040
Spark context available as 'sc' (master = local[1], app id =
local-1726519863363).
SparkSession available as 'spark'.
>>>
```
### Does this PR introduce _any_ user-facing change?
No. `spark.test.master` is a new parameter.
### How was this patch tested?
Manual tests.
### Was this patch authored or co-authored using generative AI tooling?
No.
Closes #48126 from dongjoon-hyun/SPARK-49678.
Authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
---
core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git
a/core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala
b/core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala
index 32dd2f81bbc8..2c9ddff34805 100644
--- a/core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala
+++ b/core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala
@@ -43,7 +43,8 @@ private[deploy] class SparkSubmitArguments(args: Seq[String],
env: Map[String, S
extends SparkSubmitArgumentsParser with Logging {
var maybeMaster: Option[String] = None
// Global defaults. These should be keep to minimum to avoid confusing
behavior.
- def master: String = maybeMaster.getOrElse("local[*]")
+ def master: String =
+ maybeMaster.getOrElse(System.getProperty("spark.test.master", "local[*]"))
var maybeRemote: Option[String] = None
var deployMode: String = null
var executorMemory: String = null
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]