josh-fell opened a new issue #19883: URL: https://github.com/apache/airflow/issues/19883
### Apache Airflow Provider(s) microsoft-azure ### Versions of Apache Airflow Providers Latest of all providers available on Airflow `main`. ### Apache Airflow version main (development) ### Operating System Debian GNU/Linux 10 (buster) ### Deployment Other ### Deployment details Using Breeze on `main` branch. ### What happened When authenticating to Azure Blob Storage using the ["Shared Key" method](https://docs.microsoft.com/en-us/rest/api/storageservices/authorize-with-shared-key) (which only includes the shared access key and storage account URL), both the key value and URL are printed in plain text to the task logs. Example log entry: ``` [2021-11-29, 21:57:53 EST] {base.py:79} INFO - Using connection to: id: wasb_default. Host: https://myAccountBlahBlah.blob.core.windows.net/, Port: None, Schema: , Login: , Password: None, extra: {'extra__wasb__connection_string': '', 'extra__wasb__sas_token': '***', 'extra__wasb__shared_access_key': 'KEa1QvMjxMNLuzTgVZFvS6PpQv087Ls0Oq+7Ic/fa9Lu3RQwunHi61yZTCJSCIo1gZBNLuzTgVZFvS6PpQv087Ls0Oq+7Ic/fa9Lu3RQwunHi61yZTCJSCIo1gZBH1cc/KZb3EAKXrqWXXXXXXXXXXXXXXXXXXXXXX==', 'extra__wasb__tenant_id': ''} ``` ### What you expected to happen The entirety of the credentials used to authenticate to Azure Blob Storage should not be visible in plain text within the task logs. ### How to reproduce - Create an Azure Blob Storage connection populating the "Shared Access Key" and "Account Name" fields in the UI or creating a connection using the `shared_access_key`/`extra__wasb__shared_access_key` extra and `host`. - Use a simplified version of the existing `example_local_to_wasb` DAG in the Azure provider: ```python import os from datetime import datetime from airflow.models import DAG from airflow.providers.microsoft.azure.operators.wasb_delete_blob import WasbDeleteBlobOperator from airflow.providers.microsoft.azure.transfers.local_to_wasb import LocalFilesystemToWasbOperator PATH_TO_UPLOAD_FILE = os.environ.get('AZURE_PATH_TO_UPLOAD_FILE', 'example-text.txt') with DAG( "example_local_to_wasb", schedule_interval="@once", start_date=datetime(2021, 1, 1), catchup=False, default_args={"container_name": "mycontainer", "blob_name": "myblob"}, ) as dag: upload = LocalFilesystemToWasbOperator( task_id="upload_file", file_path=PATH_TO_UPLOAD_FILE, load_options={"overwrite": True} ) ``` ### Anything else While a fix is relatively straightforward, feedback on the best approach would be appreciated. There seem to be a few options: - Update the `WasbHook` to use the `password` connection field instead of the custom `extra__wasb__shared_access_key` field. - Suggested by @ashb in a discussion on #19497 - Obviously some tradeoff between custom connection fields/extras making connection creation (presumably) more straightforward/targeted vs reusing core connection fields such that they take on multiple meanings and perhaps more "burdensome" to create said connection. - What does backwards compat look like in this case? Or is it valid to call this out as a "breaking change"? - Add "shared_access_key" to `DEFAULT_SENSITIVE_FIELDS` to ensure the shared key value is masked in the connection `extras` when logged - Perhaps a bit heavy-handed of a change - Update the [logging logic](https://github.com/apache/airflow/blob/387893ae1820ca1553d05146ad9d2baaa0c0a519/airflow/hooks/base.py#L69-L80) in `airflow.hooks.base.get_connection()` to only log what connection ID is being used rather than all of the connection details. - Unclear what the user impact would be if folks use these particular details in log analysis or simply find it handy to have in the logs. - (Slightly-related side question: why is the logging only written when `host` is provided in a connection?) ### Are you willing to submit PR? - [X] Yes I am willing to submit a PR! ### Code of Conduct - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
