SameerMesiah97 commented on code in PR #62180:
URL: https://github.com/apache/airflow/pull/62180#discussion_r2830808086
##########
airflow-core/src/airflow/models/connection.py:
##########
@@ -224,9 +226,16 @@ def _normalize_conn_type(conn_type):
return conn_type
def _parse_from_uri(self, uri: str):
+ uri_match = RE_SAFE_LOG_URI.search(uri)
+ if uri_match:
+ # Create sanitised uri for logging
+ pwd = uri_match.group(2)
+ safe_log_uri = uri.replace(pwd, "******")
+ else: # Assume no password in URI
+ safe_log_uri = uri
Review Comment:
Instead of using regex to create your own parser why not use `urllib.parse`
instead? You could do something like this:
```
def _parse_from_uri(self, uri: str):
parsed = urlsplit(uri)
if parsed.password is not None:
username = parsed.username or ""
hostname = parsed.hostname or ""
port = f":{parsed.port}" if parsed.port else ""
masked_netloc = f"{username}:******@{hostname}{port}"
safe_log_uri = urlunsplit(
(
parsed.scheme,
masked_netloc,
parsed.path,
parsed.query,
parsed.fragment,
)
)
else:
safe_log_uri = uri
```
Keep in mind that the above is just a suggested appraoch. It will need to be
validated and tested.
##########
airflow-core/src/airflow/models/connection.py:
##########
@@ -48,6 +48,8 @@
RE_SANITIZE_CONN_ID = re.compile(r"^[\w#!()\-.:/\\]{1,}$")
# the conn ID max len should be 250
CONN_ID_MAX_LEN: int = 250
+# Pattern to mask URI password in log strings
+RE_SAFE_LOG_URI = re.compile(r"://(.*):(.*)@(.*?)(://.*?)?(:\d+?)?(\?.*?)?")
Review Comment:
This will work for most cases but will break under edge cases like passwords
with non-standard characters like ':' or the URI is malformed (for e.g.
http://username:[email protected]@xyz.com). I think regex is too brittle for
this specific use case.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]