uranusjr commented on code in PR #25280:
URL: https://github.com/apache/airflow/pull/25280#discussion_r938430630


##########
airflow/models/dag.py:
##########
@@ -2781,6 +2801,20 @@ def validate_schedule_and_params(self):
                     "DAG Schedule must be None, if there are any required 
params without default values"
                 )
 
+    def validate_owner_links(self) -> Dict[str, str]:
+        """Parses a given link, and verifies if it's a valid URL, or a 
'mailto' link"""
+        wrong_links = {}
+        for owner, link in self.owner_links.items():
+            result = urlparse(link)
+            if link.startswith('mailto:'):
+                # netloc is not existing for 'mailto' link, so we are checking 
that the path is parsed
+                if not all([result.scheme, result.path]):
+                    wrong_links.update({owner: link})
+            elif not all([result.scheme, result.netloc]):
+                wrong_links.update({owner: link})
+
+        return wrong_links

Review Comment:
   This can be rewritten like this:
   
   ```suggestion
       def iter_invalid_owner_links(self) -> Iterator[Tuple[str, str]]:
           """Parses a given link, and verifies if it's a valid URL, or a 
'mailto' link.
   
                Returns an iterator of invalid (owner, link) pairs.
           """
           for owner, link in self.owner_links.items():
               result = urlsplit(link)
               if result.scheme == "mailto":
                   # netloc is not existing for 'mailto' link, so we are 
checking that the path is parsed
                   if not result.scheme or not result.path:
                       yield owner, link
               elif not result.scheme or result.netloc:
                   yield owner, link
   ```
   
   And called like this `wrong_links = dict(self.iter_invalid_owner_links())`. 
A few things to note:
   
   1. Using an iterator makes memory management slightly more efficient.
   2. `urlsplit` is preferred over `urlparse` these days unless there’s 
concrete reason, [according to the 
documentation](https://docs.python.org/3/library/urllib.parse.html#urllib.parse.urlsplit)
   3. Since we already parsed the link, it’s probably better to rely on the 
result instead of checking for the link prefix (which could cause bugs…? not 
sure, but using `result` removes the worry altogether)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to