This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
     new 44f8766f7759 [SPARK-48930][CORE] Redact `awsAccessKeyId` by including 
`accesskey` pattern
44f8766f7759 is described below

commit 44f8766f77597c68a5ba5a6300bbc507f57290b4
Author: Dongjoon Hyun <[email protected]>
AuthorDate: Wed Jul 17 15:57:03 2024 -0700

    [SPARK-48930][CORE] Redact `awsAccessKeyId` by including `accesskey` pattern
    
    ### What changes were proposed in this pull request?
    
    This PR aims to redact `awsAccessKeyId` by including `accesskey` pattern.
    
    - **Apache Spark 4.0.0-preview1**
    There is no point to redact `fs.s3a.access.key` because the same value is 
exposed via `fs.s3.awsAccessKeyId` like the following. We need to redact all.
    
    ```
    $ AWS_ACCESS_KEY_ID=A AWS_SECRET_ACCESS_KEY=B bin/spark-shell
    ```
    
    ![Screenshot 2024-07-17 at 12 45 
44](https://github.com/user-attachments/assets/e3040c5d-3eb9-4944-a6d6-5179b7647426)
    
    ### Why are the changes needed?
    
    Since Apache Spark 1.1.0, `AWS_ACCESS_KEY_ID` is propagated like the 
following. However, Apache Spark does not redact them all consistently.
    - #450
    
    
https://github.com/apache/spark/blob/5d16c3134c442a5546251fd7c42b1da9fdf3969e/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala#L481-L486
    
    ### Does this PR introduce _any_ user-facing change?
    
    Users may see more redactions on configurations whose name contains 
`accesskey` case-insensitively. However, those configurations are highly likely 
to be related to the credentials.
    
    ### How was this patch tested?
    
    Pass the CIs with the newly added test cases.
    
    ### Was this patch authored or co-authored using generative AI tooling?
    
    No.
    
    Closes #47392 from dongjoon-hyun/SPARK-48930.
    
    Authored-by: Dongjoon Hyun <[email protected]>
    Signed-off-by: Dongjoon Hyun <[email protected]>
    (cherry picked from commit 1e17c392b4def939d04e556084e7b48cca86412b)
    Signed-off-by: Dongjoon Hyun <[email protected]>
---
 core/src/main/scala/org/apache/spark/internal/config/package.scala | 2 +-
 core/src/test/scala/org/apache/spark/util/UtilsSuite.scala         | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/core/src/main/scala/org/apache/spark/internal/config/package.scala 
b/core/src/main/scala/org/apache/spark/internal/config/package.scala
index c5e23cae1f84..938d6ec2e01b 100644
--- a/core/src/main/scala/org/apache/spark/internal/config/package.scala
+++ b/core/src/main/scala/org/apache/spark/internal/config/package.scala
@@ -1155,7 +1155,7 @@ package object config {
         "like YARN and event logs.")
       .version("2.1.2")
       .regexConf
-      .createWithDefault("(?i)secret|password|token|access[.]key".r)
+      .createWithDefault("(?i)secret|password|token|access[.]?key".r)
 
   private[spark] val STRING_REDACTION_PATTERN =
     ConfigBuilder("spark.redaction.string.regex")
diff --git a/core/src/test/scala/org/apache/spark/util/UtilsSuite.scala 
b/core/src/test/scala/org/apache/spark/util/UtilsSuite.scala
index 7923e81949db..1a7bfc64c23c 100644
--- a/core/src/test/scala/org/apache/spark/util/UtilsSuite.scala
+++ b/core/src/test/scala/org/apache/spark/util/UtilsSuite.scala
@@ -1093,6 +1093,7 @@ class UtilsSuite extends SparkFunSuite with 
ResetSystemProperties {
     // Set some secret keys
     val secretKeys = Seq(
       "spark.executorEnv.HADOOP_CREDSTORE_PASSWORD",
+      "spark.hadoop.fs.s3.awsAccessKeyId",
       "spark.hadoop.fs.s3a.access.key",
       "spark.my.password",
       "spark.my.sECreT")


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to