uranusjr commented on code in PR #52876:
URL: https://github.com/apache/airflow/pull/52876#discussion_r2239183567


##########
airflow-core/src/airflow/dag_processing/bundles/manager.py:
##########
@@ -81,6 +83,54 @@ def _add_example_dag_bundle(config_list):
     )
 
 
+def _is_safe_bundle_url(url: str) -> bool:
+    """
+    Check if a bundle URL is safe to use.
+
+    This function validates that the URL:
+    - Uses HTTP or HTTPS schemes (no JavaScript, data, or other schemes)
+    - Is properly formatted
+    - Doesn't contain malicious content
+    """
+    from urllib.parse import urlparse
+
+    if not url:
+        return False
+
+    try:
+        parsed = urlparse(url)
+        if parsed.scheme not in {"http", "https"}:
+            return False
+
+        if not parsed.netloc:
+            return False
+
+        if ";" in url:
+            return False

Review Comment:
   Semicolon is a valid character in a URL and has special meaning in query 
strings, although not used much these days (generally `&` is preferred) so 
banning it could break some things. I’m inclined to not treat it specially 
since it’s a user config (i.e. modifying it requires admin permission).
   
   Honestly I’m inclined to not do any validation at all instead since your 
system is already fried if someone can put malicious content in this value. But 
some sanity checks (like checking for control characters) is not a bad idea too.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to