jgoedeke commented on issue #51877:
URL: https://github.com/apache/airflow/issues/51877#issuecomment-3406032151
In my use case, I need to work with Azure Data Lake Gen2 URIs of the form:
`abfss://<file_system>@<account_name>.dfs.core.windows.net/<path>`
Here, the portion before the @ is <file_system>, not a username or
credentials, and there is no colon present to indicate a password.
When parsing the netloc, only treat the part before @ as userinfo (and issue
a warning or remove it) if it actually matches the username:password pattern
(i.e., contains a colon). For example:
````python
parts = parsed.netloc.split("@", 1)
if len(parts) == 2:
before_at, after_at = parts
# Only treat as credentials if there is a colon
if ":" in before_at:
warnings.warn(
"An Asset URI should not contain auth info (e.g. username or
password). It has been automatically dropped.",
UserWarning,
stacklevel=3,
)
normalized_netloc = after_at
else:
normalized_netloc = parsed.netloc
else:
normalized_netloc = parsed.netloc
````
This change would allow legitimate ABFSS URIs to be accepted as-is, while
still protecting against accidental inclusion of credentials for schemes that
expect them.
For reference, here’s an example of how this logic works for ABFSS:
`abfss://[email protected]/path` → no colon, nothing
stripped, should work fine.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]