kaxil commented on code in PR #62701:
URL: https://github.com/apache/airflow/pull/62701#discussion_r2874179444


##########
task-sdk/src/airflow/sdk/io/path.py:
##########
@@ -111,13 +111,21 @@ def __init__(
         # override conn_id if explicitly provided
         if conn_id is not None:
             storage_options["conn_id"] = conn_id
+
+        # pop conn_id before calling super to prevent it from being passed
+        # to the underlying fsspec filesystem, which doesn't understand it
+        self._conn_id = storage_options.pop("conn_id", None)

Review Comment:
   I think this may break derived paths (`/`, `joinpath`). 
`UPath.with_segments()` constructs children with `type(self)(..., 
**self.storage_options)`, and `__vfspath__()` for absolute paths uses 
`self.path` (without `conn_id`). After popping `conn_id` here, child paths may 
lose it.
   
   Repro:
   `base = ObjectStoragePath("s3://aws_default@bucket/prefix")`
   `child = base / "x"`
   `child.conn_id` may become `None`.
   
   Could we preserve `conn_id` for segment operations (for example by 
overriding `with_segments`/`joinpath` to pass `conn_id=self.conn_id`)?



##########
task-sdk/src/airflow/sdk/io/path.py:
##########
@@ -111,13 +111,21 @@ def __init__(
         # override conn_id if explicitly provided
         if conn_id is not None:
             storage_options["conn_id"] = conn_id
+
+        # pop conn_id before calling super to prevent it from being passed
+        # to the underlying fsspec filesystem, which doesn't understand it
+        self._conn_id = storage_options.pop("conn_id", None)

Review Comment:
   Can we also add a regression test for path derivation here?
   
   1) `ObjectStoragePath("fake://my_conn@bucket/key") / "sub"` keeps `conn_id 
== "my_conn"` and stringifies with `my_conn@...`
   2) `conn_id` is still not passed to the fsspec filesystem constructor.
   
   That should cover both the original bug and propagation regressions.



##########
task-sdk/src/airflow/sdk/io/path.py:
##########
@@ -99,7 +99,7 @@ def __init__(
         if args:
             arg0 = args[0]
             if isinstance(arg0, type(self)):
-                storage_options["conn_id"] = 
arg0.storage_options.get("conn_id")
+                storage_options["conn_id"] = arg0.conn_id

Review Comment:
   Small robustness nit: should this fallback to 
`arg0.storage_options.get("conn_id")` when `_conn_id` is absent, so cloning 
mixed old/new instances in the same process does not drop `conn_id`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to