This is an automated email from the ASF dual-hosted git repository.
dongjoon pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/branch-3.4 by this push:
new 674d4dbb4a56 [SPARK-48930][CORE] Redact `awsAccessKeyId` by including
`accesskey` pattern
674d4dbb4a56 is described below
commit 674d4dbb4a5648ffe888f5115ba1de62236fd182
Author: Dongjoon Hyun <[email protected]>
AuthorDate: Wed Jul 17 15:57:03 2024 -0700
[SPARK-48930][CORE] Redact `awsAccessKeyId` by including `accesskey` pattern
### What changes were proposed in this pull request?
This PR aims to redact `awsAccessKeyId` by including `accesskey` pattern.
- **Apache Spark 4.0.0-preview1**
There is no point to redact `fs.s3a.access.key` because the same value is
exposed via `fs.s3.awsAccessKeyId` like the following. We need to redact all.
```
$ AWS_ACCESS_KEY_ID=A AWS_SECRET_ACCESS_KEY=B bin/spark-shell
```

### Why are the changes needed?
Since Apache Spark 1.1.0, `AWS_ACCESS_KEY_ID` is propagated like the
following. However, Apache Spark does not redact them all consistently.
- #450
https://github.com/apache/spark/blob/5d16c3134c442a5546251fd7c42b1da9fdf3969e/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala#L481-L486
### Does this PR introduce _any_ user-facing change?
Users may see more redactions on configurations whose name contains
`accesskey` case-insensitively. However, those configurations are highly likely
to be related to the credentials.
### How was this patch tested?
Pass the CIs with the newly added test cases.
### Was this patch authored or co-authored using generative AI tooling?
No.
Closes #47392 from dongjoon-hyun/SPARK-48930.
Authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
(cherry picked from commit 1e17c392b4def939d04e556084e7b48cca86412b)
Signed-off-by: Dongjoon Hyun <[email protected]>
---
core/src/main/scala/org/apache/spark/internal/config/package.scala | 2 +-
core/src/test/scala/org/apache/spark/util/UtilsSuite.scala | 1 +
2 files changed, 2 insertions(+), 1 deletion(-)
diff --git a/core/src/main/scala/org/apache/spark/internal/config/package.scala
b/core/src/main/scala/org/apache/spark/internal/config/package.scala
index be210cfe59b3..caeb8164ae0e 100644
--- a/core/src/main/scala/org/apache/spark/internal/config/package.scala
+++ b/core/src/main/scala/org/apache/spark/internal/config/package.scala
@@ -1123,7 +1123,7 @@ package object config {
"like YARN and event logs.")
.version("2.1.2")
.regexConf
- .createWithDefault("(?i)secret|password|token|access[.]key".r)
+ .createWithDefault("(?i)secret|password|token|access[.]?key".r)
private[spark] val STRING_REDACTION_PATTERN =
ConfigBuilder("spark.redaction.string.regex")
diff --git a/core/src/test/scala/org/apache/spark/util/UtilsSuite.scala
b/core/src/test/scala/org/apache/spark/util/UtilsSuite.scala
index 56fb5bf6c6cf..2885cfe1c6b8 100644
--- a/core/src/test/scala/org/apache/spark/util/UtilsSuite.scala
+++ b/core/src/test/scala/org/apache/spark/util/UtilsSuite.scala
@@ -1090,6 +1090,7 @@ class UtilsSuite extends SparkFunSuite with
ResetSystemProperties {
// Set some secret keys
val secretKeys = Seq(
"spark.executorEnv.HADOOP_CREDSTORE_PASSWORD",
+ "spark.hadoop.fs.s3.awsAccessKeyId",
"spark.hadoop.fs.s3a.access.key",
"spark.my.password",
"spark.my.sECreT")
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]