This is an automated email from the ASF dual-hosted git repository.
dongjoon pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/branch-3.5 by this push:
new 44f8766f7759 [SPARK-48930][CORE] Redact `awsAccessKeyId` by including
`accesskey` pattern
44f8766f7759 is described below
commit 44f8766f77597c68a5ba5a6300bbc507f57290b4
Author: Dongjoon Hyun <[email protected]>
AuthorDate: Wed Jul 17 15:57:03 2024 -0700
[SPARK-48930][CORE] Redact `awsAccessKeyId` by including `accesskey` pattern
### What changes were proposed in this pull request?
This PR aims to redact `awsAccessKeyId` by including `accesskey` pattern.
- **Apache Spark 4.0.0-preview1**
There is no point to redact `fs.s3a.access.key` because the same value is
exposed via `fs.s3.awsAccessKeyId` like the following. We need to redact all.
```
$ AWS_ACCESS_KEY_ID=A AWS_SECRET_ACCESS_KEY=B bin/spark-shell
```

### Why are the changes needed?
Since Apache Spark 1.1.0, `AWS_ACCESS_KEY_ID` is propagated like the
following. However, Apache Spark does not redact them all consistently.
- #450
https://github.com/apache/spark/blob/5d16c3134c442a5546251fd7c42b1da9fdf3969e/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala#L481-L486
### Does this PR introduce _any_ user-facing change?
Users may see more redactions on configurations whose name contains
`accesskey` case-insensitively. However, those configurations are highly likely
to be related to the credentials.
### How was this patch tested?
Pass the CIs with the newly added test cases.
### Was this patch authored or co-authored using generative AI tooling?
No.
Closes #47392 from dongjoon-hyun/SPARK-48930.
Authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
(cherry picked from commit 1e17c392b4def939d04e556084e7b48cca86412b)
Signed-off-by: Dongjoon Hyun <[email protected]>
---
core/src/main/scala/org/apache/spark/internal/config/package.scala | 2 +-
core/src/test/scala/org/apache/spark/util/UtilsSuite.scala | 1 +
2 files changed, 2 insertions(+), 1 deletion(-)
diff --git a/core/src/main/scala/org/apache/spark/internal/config/package.scala
b/core/src/main/scala/org/apache/spark/internal/config/package.scala
index c5e23cae1f84..938d6ec2e01b 100644
--- a/core/src/main/scala/org/apache/spark/internal/config/package.scala
+++ b/core/src/main/scala/org/apache/spark/internal/config/package.scala
@@ -1155,7 +1155,7 @@ package object config {
"like YARN and event logs.")
.version("2.1.2")
.regexConf
- .createWithDefault("(?i)secret|password|token|access[.]key".r)
+ .createWithDefault("(?i)secret|password|token|access[.]?key".r)
private[spark] val STRING_REDACTION_PATTERN =
ConfigBuilder("spark.redaction.string.regex")
diff --git a/core/src/test/scala/org/apache/spark/util/UtilsSuite.scala
b/core/src/test/scala/org/apache/spark/util/UtilsSuite.scala
index 7923e81949db..1a7bfc64c23c 100644
--- a/core/src/test/scala/org/apache/spark/util/UtilsSuite.scala
+++ b/core/src/test/scala/org/apache/spark/util/UtilsSuite.scala
@@ -1093,6 +1093,7 @@ class UtilsSuite extends SparkFunSuite with
ResetSystemProperties {
// Set some secret keys
val secretKeys = Seq(
"spark.executorEnv.HADOOP_CREDSTORE_PASSWORD",
+ "spark.hadoop.fs.s3.awsAccessKeyId",
"spark.hadoop.fs.s3a.access.key",
"spark.my.password",
"spark.my.sECreT")
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]