uranusjr commented on code in PR #40173:
URL: https://github.com/apache/airflow/pull/40173#discussion_r1675179047


##########
airflow/datasets/__init__.py:
##########
@@ -66,7 +73,8 @@ def _sanitize_uri(uri: str) -> str:
     parsed = urllib.parse.urlsplit(uri)
     if not parsed.scheme and not parsed.netloc:  # Does not look like a URI.
         return uri
-    normalized_scheme = parsed.scheme.lower()
+    if (normalized_scheme := _get_normalized_scheme(uri)) is None:
+        return uri

Review Comment:
   … while you could just do
   
   ```python
   if not (normalized_scheme := parsed.scheme.lower()):
       return uri
   ```
   



##########
airflow/datasets/__init__.py:
##########
@@ -50,6 +50,13 @@ def _get_uri_normalizer(scheme: str) -> 
Callable[[SplitResult], SplitResult] | N
     return ProvidersManager().dataset_uri_handlers.get(scheme)
 
 
+def _get_normalized_scheme(uri: str) -> str | None:
+    parsed = urllib.parse.urlsplit(uri)
+    if not parsed.scheme:
+        return None

Review Comment:
   Why not just return an empty string? That’s the default behaviour of 
urlsplit. It’s weird you take the effort to define a new function to check the 
default behaviour, convert it to None, and then introduce an extra check for 
that new None possibility you introduced yourself.



##########
airflow/datasets/__init__.py:
##########
@@ -182,6 +190,28 @@ def __eq__(self, other: Any) -> bool:
     def __hash__(self) -> int:
         return hash(self.uri)
 
+    @property
+    def normalized_uri(self) -> str | None:
+        """
+        Returns the normalized and AIP-60 compliant URI whenever possible.
+
+        If we can't retrieve the scheme from URI or no normalizer is provided 
or if parsing fails,
+        it returns None.
+
+        If a normalizer for the scheme exists and parsing is successful we 
return the normalizer result.
+        """
+        if (normalized_scheme := _get_normalized_scheme(self.uri)) is None:
+            return None

Review Comment:
   and same here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to