galafis opened a new pull request, #63738:
URL: https://github.com/apache/airflow/pull/63738

   ## What this fixes
   
   Closes #54981
   
   When `BeamRunPythonPipelineOperator` runs in **non-deferred** mode, 
Airflow's `SecretsMasker` picks up sensitive values (passwords, API tokens, 
etc.) from the standard logging pipeline and redacts them automatically. 
However, in **deferred** mode the operator hands off to 
`BeamPythonPipelineTrigger`, which stores the raw `variables` dict. When the 
triggerer process logs the trigger's state (via `__repr__` → `serialize()`), 
those sensitive values show up unmasked in the logs.
   
   ## How it works
   
   This PR adds a small helper `_register_sensitive_variables()` that scans the 
pipeline `variables` dict for keys commonly associated with credentials 
(`password`, `token`, `api_key`, `secret`, etc.) and registers their values 
with `mask_secret()` from Airflow's `SecretsMasker`. This is called in 
`__init__` of both `BeamPythonPipelineTrigger` and `BeamJavaPipelineTrigger`, 
so the masker knows about those values before any logging happens.
   
   The approach is consistent with how other parts of Airflow handle secret 
masking - registering values so the global `SecretsMasker` filter can redact 
them wherever they appear in log output.
   
   ## Changes
   
   - Added `_SENSITIVE_VARIABLE_KEYS` frozenset with common sensitive key 
patterns
   - Added `_register_sensitive_variables()` helper function
   - Called the helper in `BeamPythonPipelineTrigger.__init__()` and 
`BeamJavaPipelineTrigger.__init__()` 
   - Added import for `mask_secret` from `airflow.utils.log.secrets_masker`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to