This is an automated email from the ASF dual-hosted git repository.
dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push:
new 87cae7bc7870 [SPARK-47552][CORE] Set
`spark.hadoop.fs.s3a.connection.establish.timeout` to 30s if missing
87cae7bc7870 is described below
commit 87cae7bc7870bacafc6afad99ba86a6efca2a464
Author: Dongjoon Hyun <[email protected]>
AuthorDate: Mon Mar 25 16:06:03 2024 -0700
[SPARK-47552][CORE] Set `spark.hadoop.fs.s3a.connection.establish.timeout`
to 30s if missing
### What changes were proposed in this pull request?
This PR aims to handle HADOOP-19097 from Apache Spark side. We can remove
this when Apache Hadoop `3.4.1` releases.
- https://github.com/apache/hadoop/pull/6601
### Why are the changes needed?
Apache Hadoop shows a warning to its default configuration. This default
value issue is fixed at Apache Spark 3.4.1.
```
24/03/25 14:46:21 WARN ConfigurationHelper: Option
fs.s3a.connection.establish.timeout is too low (5,000 ms). Setting to 15,000 ms
instead
```
This change will suppress Apache Hadoop default warning in the consistent
way with the future Hadoop releases.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Pass the CIs.
Manually.
**BUILD**
```
$ dev/make-distribution.sh -Phadoop-cloud
```
**BEFORE**
```
scala>
spark.range(10).write.mode("overwrite").orc("s3a://express-1-zone--***--x-s3/orc/")
...
24/03/25 15:50:46 WARN ConfigurationHelper: Option
fs.s3a.connection.establish.timeout is too low (5,000 ms). Setting to 15,000 ms
instead
```
**AFTER**
```
scala>
spark.range(10).write.mode("overwrite").orc("s3a://express-1-zone--***--x-s3/orc/")
...(ConfigurationHelper warning is gone)...
```
### Was this patch authored or co-authored using generative AI tooling?
No.
Closes #45710 from dongjoon-hyun/SPARK-47552.
Authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
---
core/src/main/scala/org/apache/spark/SparkContext.scala | 3 +++
1 file changed, 3 insertions(+)
diff --git a/core/src/main/scala/org/apache/spark/SparkContext.scala
b/core/src/main/scala/org/apache/spark/SparkContext.scala
index d519617c4095..f8f0107ed139 100644
--- a/core/src/main/scala/org/apache/spark/SparkContext.scala
+++ b/core/src/main/scala/org/apache/spark/SparkContext.scala
@@ -417,6 +417,9 @@ class SparkContext(config: SparkConf) extends Logging {
if (!_conf.contains("spark.app.name")) {
throw new SparkException("An application name must be set in your
configuration")
}
+ // HADOOP-19097 Set fs.s3a.connection.establish.timeout to 30s
+ // We can remove this after Apache Hadoop 3.4.1 releases
+ conf.setIfMissing("spark.hadoop.fs.s3a.connection.establish.timeout",
"30s")
// This should be set as early as possible.
SparkContext.fillMissingMagicCommitterConfsIfNeeded(_conf)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]