uranusjr commented on code in PR #25280:
URL: https://github.com/apache/airflow/pull/25280#discussion_r938430630
##########
airflow/models/dag.py:
##########
@@ -2781,6 +2801,20 @@ def validate_schedule_and_params(self):
"DAG Schedule must be None, if there are any required
params without default values"
)
+ def validate_owner_links(self) -> Dict[str, str]:
+ """Parses a given link, and verifies if it's a valid URL, or a
'mailto' link"""
+ wrong_links = {}
+ for owner, link in self.owner_links.items():
+ result = urlparse(link)
+ if link.startswith('mailto:'):
+ # netloc is not existing for 'mailto' link, so we are checking
that the path is parsed
+ if not all([result.scheme, result.path]):
+ wrong_links.update({owner: link})
+ elif not all([result.scheme, result.netloc]):
+ wrong_links.update({owner: link})
+
+ return wrong_links
Review Comment:
This can be rewritten like this:
```suggestion
def iter_invalid_owner_links(self) -> Iterator[Tuple[str, str]]:
"""Parses a given link, and verifies if it's a valid URL, or a
'mailto' link.
Returns an iterator of invalid (owner, link) pairs.
"""
for owner, link in self.owner_links.items():
result = urlsplit(link)
if result.scheme == "mailto":
# netloc is not existing for 'mailto' link, so we are
checking that the path is parsed
if not result.scheme or not result.path:
yield owner, link
elif not result.scheme or result.netloc:
yield owner, link
```
And called like this `wrong_links = dict(self.iter_invalid_owner_links())`.
A few things to note:
1. Using an iterator makes memory management slightly more efficient.
2. `urlsplit` is preferred over `urlparse` these days unless there’s
concrete reason, [according to the
documentation](https://docs.python.org/3/library/urllib.parse.html#urllib.parse.urlsplit)
3. Since we already parsed the link, it’s probably better to rely on the
result instead of checking for the link prefix (which could cause bugs…? not
sure, but using `result` removes the worry altogether)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]