josh-fell opened a new issue #19883:
URL: https://github.com/apache/airflow/issues/19883


   ### Apache Airflow Provider(s)
   
   microsoft-azure
   
   ### Versions of Apache Airflow Providers
   
   Latest of all providers available on Airflow `main`.
   
   
   ### Apache Airflow version
   
   main (development)
   
   ### Operating System
   
   Debian GNU/Linux 10 (buster)
   
   ### Deployment
   
   Other
   
   ### Deployment details
   
   Using Breeze on `main` branch.
   
   ### What happened
   
   When authenticating to Azure Blob Storage using the ["Shared Key" 
method](https://docs.microsoft.com/en-us/rest/api/storageservices/authorize-with-shared-key)
 (which only includes the shared access key and storage account URL), both the 
key value and URL are printed in plain text to the task logs.
   
   Example log entry:
   ```
   [2021-11-29, 21:57:53 EST] {base.py:79} INFO - Using connection to: id: 
wasb_default. Host: https://myAccountBlahBlah.blob.core.windows.net/, Port: 
None, Schema: , Login: , Password: None, extra: 
{'extra__wasb__connection_string': '', 'extra__wasb__sas_token': '***', 
'extra__wasb__shared_access_key': 
'KEa1QvMjxMNLuzTgVZFvS6PpQv087Ls0Oq+7Ic/fa9Lu3RQwunHi61yZTCJSCIo1gZBNLuzTgVZFvS6PpQv087Ls0Oq+7Ic/fa9Lu3RQwunHi61yZTCJSCIo1gZBH1cc/KZb3EAKXrqWXXXXXXXXXXXXXXXXXXXXXX==',
 'extra__wasb__tenant_id': ''}
   ```
   
   ### What you expected to happen
   
   The entirety of the credentials used to authenticate to Azure Blob Storage 
should not be visible in plain text within the task logs.
   
   ### How to reproduce
   
   - Create an Azure Blob Storage connection populating the "Shared Access Key" 
and "Account Name" fields in the UI or creating a connection using the 
`shared_access_key`/`extra__wasb__shared_access_key` extra and `host`.
   - Use a simplified version of the existing `example_local_to_wasb` DAG in 
the Azure provider:
   ```python
   import os
   from datetime import datetime
   
   from airflow.models import DAG
   from airflow.providers.microsoft.azure.operators.wasb_delete_blob import 
WasbDeleteBlobOperator
   from airflow.providers.microsoft.azure.transfers.local_to_wasb import 
LocalFilesystemToWasbOperator
   
   PATH_TO_UPLOAD_FILE = os.environ.get('AZURE_PATH_TO_UPLOAD_FILE', 
'example-text.txt')
   
   with DAG(
       "example_local_to_wasb",
       schedule_interval="@once",
       start_date=datetime(2021, 1, 1),
       catchup=False,
       default_args={"container_name": "mycontainer", "blob_name": "myblob"},
   ) as dag:
       upload = LocalFilesystemToWasbOperator(
           task_id="upload_file", file_path=PATH_TO_UPLOAD_FILE, 
load_options={"overwrite": True}
       )
   ```
   
   ### Anything else
   
   While a fix is relatively straightforward, feedback on the best approach 
would be appreciated. There seem to be a few options:
   
   - Update the `WasbHook` to use the `password` connection field instead of 
the custom `extra__wasb__shared_access_key` field.
     - Suggested by @ashb in a discussion on #19497
     - Obviously some tradeoff between custom connection fields/extras making 
connection creation (presumably) more straightforward/targeted vs reusing core 
connection fields such that they take on multiple meanings and perhaps more 
"burdensome" to create said connection.
     - What does backwards compat look like in this case? Or is it valid to 
call this out as a "breaking change"?
   - Add "shared_access_key" to `DEFAULT_SENSITIVE_FIELDS` to ensure the shared 
key value is masked in the connection `extras` when logged
     - Perhaps a bit heavy-handed of a change
   - Update the [logging 
logic](https://github.com/apache/airflow/blob/387893ae1820ca1553d05146ad9d2baaa0c0a519/airflow/hooks/base.py#L69-L80)
 in `airflow.hooks.base.get_connection()` to only log what connection ID is 
being used rather than all of the connection details.
     - Unclear what the user impact would be if folks use these particular 
details in log analysis or simply find it handy to have in the logs.
     - (Slightly-related side question: why is the logging only written when 
`host` is provided in a connection?)
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to