SameerMesiah97 commented on code in PR #62180:
URL: https://github.com/apache/airflow/pull/62180#discussion_r2830808086


##########
airflow-core/src/airflow/models/connection.py:
##########
@@ -224,9 +226,16 @@ def _normalize_conn_type(conn_type):
         return conn_type
 
     def _parse_from_uri(self, uri: str):
+        uri_match = RE_SAFE_LOG_URI.search(uri)
+        if uri_match:
+            # Create sanitised uri for logging
+            pwd = uri_match.group(2)
+            safe_log_uri = uri.replace(pwd, "******")
+        else:  # Assume no password in URI
+            safe_log_uri = uri

Review Comment:
   Instead of using regex to create your own parser why not use `urllib.parse` 
instead? You could do something like this:
   
   ```
   
   def _parse_from_uri(self, uri: str):
   
       parsed = urlsplit(uri)
       
       if parsed.password is not None:
           username = parsed.username or ""
           hostname = parsed.hostname or ""
           port = f":{parsed.port}" if parsed.port else ""    
           masked_netloc = f"{username}:******@{hostname}{port}"
       
           safe_log_uri = urlunsplit(
               (
                   parsed.scheme,
                   masked_netloc,
                   parsed.path,
                   parsed.query,
                   parsed.fragment,
               )
           )
       else:
           safe_log_uri = uri
     ```
   Keep in mind that the above is just a suggested appraoch. It will need to be 
validated and tested.



##########
airflow-core/src/airflow/models/connection.py:
##########
@@ -48,6 +48,8 @@
 RE_SANITIZE_CONN_ID = re.compile(r"^[\w#!()\-.:/\\]{1,}$")
 # the conn ID max len should be 250
 CONN_ID_MAX_LEN: int = 250
+# Pattern to mask URI password in log strings
+RE_SAFE_LOG_URI = re.compile(r"://(.*):(.*)@(.*?)(://.*?)?(:\d+?)?(\?.*?)?")

Review Comment:
   This will work for most cases but will break under edge cases like passwords 
with non-standard characters like ':' or the URI is malformed (for e.g. 
http://username:[email protected]@xyz.com). I think regex is too brittle for 
this specific use case.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to