anshuksi282-ksolves opened a new pull request, #68624: URL: https://github.com/apache/airflow/pull/68624
## Fix sensitive data leak in SparkSubmitOperator truncated templates ### Description This PR fixes a critical bug where sensitive data (like `spark.hadoop.fs.s3a.bucket.spark.access.key` or `spark.sql.catalog.kometa.token`) in Spark and Hadoop configuration dictionaries was leaking in the Airflow UI's Rendered Template view when the field was truncated. --- ### Root Cause Spark, Hadoop, and Kubernetes configuration keys frequently use dot notation or dashes. The `SecretsMasker`'s `should_hide_value_for_key` method was performing a direct substring check against underscore-style default sensitive fields (e.g., `access_key`, `secret_key`). Because `access.key` or `access-key` does not match `access_key`, the masker completely bypassed these sensitive fields. --- ### Changes Made * **Core Fix:** Updated `should_hide_value_for_key` in `airflow_shared/secrets_masker/secrets_masker.py` to normalize non-word separators (dots, dashes, slashes) to underscores using `re.sub(r"\W+", "_", name)`. This ensures dotted/dashed config keys correctly map to the underscore-style sensitive fields. * **Unit Tests:** Added positive and negative test cases in `test_secrets_masker.py` to ensure dotted/dashed keys are masked, while normal configs (like `spark.executor.memory`) remain unmasked. * **Integration Tests:** Added a specific test in `test_helpers.py` (`test_serialize_template_field_masks_dotted_sensitive_keys_on_truncation`) to reproduce and verify the fix under truncation scenarios (padding payload beyond `MAX_TEMPLATED_FIELD_LENGTH`). --- ### Before / After #### Before (without fix) `spark.hadoop...access.key` and `my-access-key` were **NOT** being masked — 2 tests failing: ``` FAILED test_hiding_defaults[spark.hadoop.fs.s3a.bucket.spark.access.key-True] FAILED test_hiding_defaults[my-access-key-True] 25 passed, 2 failed ``` #### After (with fix) All dotted/dashed sensitive keys are correctly masked — all 27 tests passing: ``` PASSED test_hiding_defaults[spark.hadoop.fs.s3a.bucket.spark.access.key-True] PASSED test_hiding_defaults[spark.hadoop.fs.s3a.bucket.spark.secret.key-True] PASSED test_hiding_defaults[spark.sql.catalog.kometa.token-True] PASSED test_hiding_defaults[my-access-key-True] PASSED test_hiding_defaults[auth.example.com/token-True] PASSED test_hiding_defaults[spark.executor.memory-False] PASSED test_hiding_defaults[spark.driver.cores-False] 27 passed, 0 failed ``` --- ### Closes Fixes #67459 --- ##### Was generative AI tooling used to co-author this PR? - [x] Yes (please specify the tool below) Generated-by: Claude Code (Opus 4.8) following [the guidelines](https://github.com/apache/airflow/blob/main/contributing-docs/05_pull_requests.rst#gen-ai-assisted-contributions) --- * Read the **[Pull Request Guidelines](https://github.com/apache/airflow/blob/main/contributing-docs/05_pull_requests.rst#pull-request-guidelines)** for more information. Note: commit author/co-author name and email in commits become permanently public when merged. * For fundamental code changes, an Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvement+Proposals)) is needed. * When adding dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x). * For significant user-facing changes create newsfragment: `{pr_number}.significant.rst`, in [airflow-core/newsfragments](https://github.com/apache/airflow/tree/main/airflow-core/newsfragments). You can add this file in a follow-up commit after the PR is created so you know the PR number. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
