alexdrydew commented on issue #38745:
URL: https://github.com/apache/airflow/issues/38745#issuecomment-2041073791
My concern was primarily about processing untrusted data from external
sources in DAGs: it seems malicious data can be used to steal secrets in some
cases:
```python
@dag(...)
def pipeline():
data = download_parameters_from_s3()
transformed_data = transform.expand_kwargs(data)
upload_to_s3(transformed_data)
```
in this case data author could include `{{ var.value.get('SOME_SECRET') }}`
template and get access to the variable if the target storage is available for
them. I understand that this case is probably out of scope of the airflow
security model but the way how plain TaskFlow-style tasks communicate using
XCom allows to process untrusted data in this way.
But not to change focus: my main concern is that even if we don't return
processed untrusted data to potentially malicious user back we still need to
sanitize inputs specifically for `expand_kwargs` in order not to fail while
processing data that may contain template-like syntax (e.g. parsed webpage)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]