anshuksi282-ksolves opened a new pull request, #68624:
URL: https://github.com/apache/airflow/pull/68624

   ## Fix sensitive data leak in SparkSubmitOperator truncated templates
   
   ### Description
   
   This PR fixes a critical bug where sensitive data (like 
`spark.hadoop.fs.s3a.bucket.spark.access.key` or 
`spark.sql.catalog.kometa.token`) in Spark and Hadoop configuration 
dictionaries was leaking in the Airflow UI's Rendered Template view when the 
field was truncated.
   
   ---
   
   ### Root Cause
   
   Spark, Hadoop, and Kubernetes configuration keys frequently use dot notation 
or dashes. The `SecretsMasker`'s `should_hide_value_for_key` method was 
performing a direct substring check against underscore-style default sensitive 
fields (e.g., `access_key`, `secret_key`). Because `access.key` or `access-key` 
does not match `access_key`, the masker completely bypassed these sensitive 
fields.
   
   ---
   
   ### Changes Made
   
   * **Core Fix:** Updated `should_hide_value_for_key` in 
`airflow_shared/secrets_masker/secrets_masker.py` to normalize non-word 
separators (dots, dashes, slashes) to underscores using `re.sub(r"\W+", "_", 
name)`. This ensures dotted/dashed config keys correctly map to the 
underscore-style sensitive fields.
   
   * **Unit Tests:** Added positive and negative test cases in 
`test_secrets_masker.py` to ensure dotted/dashed keys are masked, while normal 
configs (like `spark.executor.memory`) remain unmasked.
   
   * **Integration Tests:** Added a specific test in `test_helpers.py` 
(`test_serialize_template_field_masks_dotted_sensitive_keys_on_truncation`) to 
reproduce and verify the fix under truncation scenarios (padding payload beyond 
`MAX_TEMPLATED_FIELD_LENGTH`).
   
   ---
   
   ### Before / After
   
   #### Before (without fix)
   
   `spark.hadoop...access.key` and `my-access-key` were **NOT** being masked — 
2 tests failing:
   
   ```
   FAILED test_hiding_defaults[spark.hadoop.fs.s3a.bucket.spark.access.key-True]
   FAILED test_hiding_defaults[my-access-key-True]
   
   25 passed, 2 failed
   ```
   
   #### After (with fix)
   
   All dotted/dashed sensitive keys are correctly masked — all 27 tests passing:
   
   ```
   PASSED test_hiding_defaults[spark.hadoop.fs.s3a.bucket.spark.access.key-True]
   PASSED test_hiding_defaults[spark.hadoop.fs.s3a.bucket.spark.secret.key-True]
   PASSED test_hiding_defaults[spark.sql.catalog.kometa.token-True]
   PASSED test_hiding_defaults[my-access-key-True]
   PASSED test_hiding_defaults[auth.example.com/token-True]
   PASSED test_hiding_defaults[spark.executor.memory-False]
   PASSED test_hiding_defaults[spark.driver.cores-False]
   
   27 passed, 0 failed
   ```
   
   ---
   
   ### Closes
   
   Fixes #67459
   
   ---
   
   ##### Was generative AI tooling used to co-author this PR?
   - [x] Yes (please specify the tool below)
   Generated-by: Claude Code (Opus 4.8) following [the 
guidelines](https://github.com/apache/airflow/blob/main/contributing-docs/05_pull_requests.rst#gen-ai-assisted-contributions)
   ---
   * Read the **[Pull Request 
Guidelines](https://github.com/apache/airflow/blob/main/contributing-docs/05_pull_requests.rst#pull-request-guidelines)**
 for more information. Note: commit author/co-author name and email in commits 
become permanently public when merged.
   * For fundamental code changes, an Airflow Improvement Proposal 
([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvement+Proposals))
 is needed.
   * When adding dependency, check compliance with the [ASF 3rd Party License 
Policy](https://www.apache.org/legal/resolved.html#category-x).
   * For significant user-facing changes create newsfragment: 
`{pr_number}.significant.rst`, in 
[airflow-core/newsfragments](https://github.com/apache/airflow/tree/main/airflow-core/newsfragments).
 You can add this file in a follow-up commit after the PR is created so you 
know the PR number.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to