varaprasadregani opened a new issue, #58514: URL: https://github.com/apache/airflow/issues/58514
### What do you see as an issue? **Docs link:** [Masking Sensitive Data](https://airflow.apache.org/docs/apache-airflow/stable/security/secrets/mask-sensitive-values.html#masking-sensitive-data) **Source code link:** [secrets_masker.py](https://github.com/apache/airflow/blob/main/airflow/shared/secrets_masker/src/airflow_shared/secrets_masker/secrets_masker.py) The documentation currently states: > “Airflow will by default mask Connection passwords and keys from a Connection’s extra (JSON) field when they appear in Task logs, in the Variable and in the Rendered fields views of the UI.” This statement is misleading because it implies that **all** keys in a Connection’s extra JSON field are masked. However, only keys whose names contain known sensitive keywords are actually redacted. The complete list of sensitive keywords from the source code is: `access_token`, `api_key`, `apikey`, `authorization`, `passphrase`, `passwd`, `password`, `private_key`, `secret`, `token`, `keyfile_dict`, `service_account` **Code used to reproduce this:** I verified this behavior using the following DAG, extracting values from a Connection's `extra` field (`bigquery_connection_id`) and Airflow Variables: ```python from airflow import DAG from airflow.operators.bash import BashOperator import pendulum from airflow.hooks.base import BaseHook from airflow.models import Variable # Fetch connection and extract 'extra' JSON conn = BaseHook.get_connection(conn_id="bigquery_connection_id") extra_data = conn.extra_dejson # Test specific keys from 'extra' keyfile_dict = extra_data.get("keyfile_dict", "not found") # Contains 'keyfile_dict' param1_token = extra_data.get("param1_token", "not found") # Contains 'token' hello = extra_data.get("hello", "not found") # No sensitive keyword # Test Variables test_keyfile_dict = Variable.get("test_keyfile_dict") service_account = Variable.get("service_account") with DAG( dag_id="test_masking", start_date=pendulum.datetime(2024, 1, 1, tz="UTC"), schedule=None, catchup=False, ) as dag: test_masking = BashOperator( task_id="masking_task", bash_command=f"echo '{ keyfile_dict }' > { param1_token } > { hello } > { test_keyfile_dict } > { service_account }" ) ``` **From my testing:** * `extra__google_cloud_platform__keyfile_dict` (from the connection’s extra JSON) → **Masked** everywhere (Rendered templates, UI, logs). * `hello` (no sensitive keyword) → **Not masked**. * `Variable.get("test_keyfile_dict")` → **Masked** only in Variables UI. * `Variable.get("service_account")` → **Masked** in Variables UI, Rendered templates, and logs. The docs should clarify that not all keys in a Connection’s extra JSON are masked—only those containing a sensitive keyword. **Screenshots of observations:** Rendered Templates: <img width="754" height="170" alt="Image" src="https://github.com/user-attachments/assets/461a937c-4319-465b-a6ff-9f329ce0f9c8" /> Logs: <img width="900" height="368" alt="Image" src="https://github.com/user-attachments/assets/1e9aa32c-fe9b-41ef-851d-ca8b5e941f29" /> Variables UI: <img width="1304" height="318" alt="Image" src="https://github.com/user-attachments/assets/29d47709-618f-4f68-bce4-5941983deeff" /> ### Solving the problem I suggest two specific updates to the documentation to fix this ambiguity and clarify the scope of masking: **1. Update the default masking paragraph** Clarify that masking is conditional on the key name. *Current Text:* > Airflow will by default mask Connection passwords and sensitive Variables and keys from a Connection’s extra (JSON) field when they appear in Task logs, in the Variable and in the Rendered fields views of the UI. *Proposed Text:* > Airflow will by default mask Connection passwords, sensitive Variables, and keys from a Connection’s extra (JSON) field **whose names contain one or more of the sensitive keywords** when they appear in Task logs, in the Variables UI, and in the Rendered fields views of the UI. Keys in the extra JSON that do not include any of these sensitive keywords will not be redacted automatically. **2. Update the "Sensitive field names" section** In the [Sensitive field names](https://airflow.apache.org/docs/apache-airflow/stable/security/secrets/mask-sensitive-values.html#sensitive-field-names) section, explicitly list the default keywords and add a table illustrating how the source and keyword affect where the data is masked. *Suggested Addition:* > **Default Sensitive Keywords:** > `access_token`, `api_key`, `apikey`, `authorization`, `passphrase`, `passwd`, `password`, `private_key`, `secret`, `token`, `keyfile_dict`, `service_account`. > > **Examples of Masking Behavior:** > > | Source | Key / Variable Name | Matching Keyword | Masking Scope | > | :--- | :--- | :--- | :--- | > | **Connection Extra** | `google_keyfile_dict` | `keyfile_dict` | **Everywhere** (Logs, Rendered Templates, UI) | > | **Connection Extra** | `hello` | *None* | **Not Masked** | > | **Variable** | `service_account` | `service_account` | **Everywhere** (Logs, Rendered Templates, UI) | > | **Variable** | `test_keyfile_dict` | `keyfile_dict` | **Variables UI Only** | ### Anything else _No response_ ### Are you willing to submit PR? - [x] Yes I am willing to submit a PR! ### Code of Conduct - [x] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
