kaxil commented on code in PR #64180:
URL: https://github.com/apache/airflow/pull/64180#discussion_r2983634829


##########
providers/google/src/airflow/providers/google/marketing_platform/operators/bid_manager.py:
##########
@@ -22,6 +22,7 @@
 import json
 import shutil
 import tempfile
+import urllib.parse

Review Comment:
   The PR adds `import urllib.parse` but line 28 already has `from urllib.parse 
import urlsplit`. After this change you have both styles in the same file. 
Simpler to add `urlparse` to the existing import: `from urllib.parse import 
urlparse, urlsplit`, then use bare `urlparse(file_url)` below.



##########
providers/google/src/airflow/providers/google/marketing_platform/operators/bid_manager.py:
##########
@@ -317,8 +318,15 @@ def execute(self, context: Context):
 
         # If no custom report_name provided, use Bid Manager name
         file_url = resource["metadata"]["googleCloudStoragePath"]
-        if urllib.parse.urlparse(file_url).scheme == "file":
-            raise AirflowException("Accessing local file is not allowed in 
this operator")
+        parsed_url = urllib.parse.urlparse(file_url)

Review Comment:
   Nice switch from blocklist to allowlist. One thing to consider: 
`urllib.request.urlopen` (line 334 below) follows HTTP redirects by default. 
The validation here checks the initial URL, but a 302 from a GCS host could 
theoretically redirect to an arbitrary destination. In practice GCS doesn't do 
that, but for a security-focused change, either a comment explaining why this 
is acceptable or a custom opener that rejects redirects would close the gap.



##########
providers/google/tests/unit/google/marketing_platform/operators/test_bid_manager.py:
##########
@@ -74,7 +74,16 @@ def teardown_method(self):
             session.execute(delete(TI))
 
     @pytest.mark.parametrize(
-        ("file_path", "should_except"), [("https://host/path";, False), 
("file:/path/to/file", True)]
+        ("file_path", "should_except"),
+        [
+            ("https://storage.googleapis.com/bucket/report.csv";, False),
+            ("https://storage.cloud.google.com/bucket/report.csv";, False),
+            ("file:/path/to/file", True),
+            ("http://storage.googleapis.com/bucket/report.csv";, True),
+            ("https://evil.com/report.csv";, True),
+            ("https://internal-service.local/secret";, True),
+            ("ftp://storage.googleapis.com/bucket/report.csv";, True),

Review Comment:
   Good coverage. Worth adding a userinfo-based bypass attempt too, something 
like `("https://[email protected]/report.csv";, True)`. `urlparse` 
handles this correctly (hostname resolves to `evil.com`), but having it in the 
test suite documents awareness of that attack vector.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to