bolkedebruin commented on PR #34729:
URL: https://github.com/apache/airflow/pull/34729#issuecomment-1745354692
> Design wise: I'm not sure about the `fs.mount` affecting global state --
could we instead do something like this:
>
> ```python
> s3fs = afs.mount("s3://warehouse", "/mnt/warehouse", conn_id="my_aws")
> localfs = afs.mount("file:///tmp", "/mnt/tmp")
>
> @task
> def load_file(src):
> with s3fs.open(src) as f:
> f.read()
> ```
>
> Edit: clearly we "could", so I guess I'm asking what was your thinking for
choosing the current way?
>
> Implementation wise I think the `_adlfs` etc methods should live in the
azure etc. provider -- that way new providers can add in their own
implementation.
Can you explain what your intention for local would be? The idea that I have
is to have a global scope and a local scope within a task. So if you mount
something within a task it doesn't propagate (by default?) to downstream tasks.
However, global mounts would be available across all tasks.
The global scope in my mind is useful when you intend to manipulate some
files across your tasks and this allows for easy CI/CD for example. Note btw
that you can do this:
```python
afs.mount("s3://warehouse", "/mnt/s3")
afs.mount("gcs://storage", "/mnt/gcs")
afs.copy("/mnt/s3/data", "/mnt/gcs")
```
and it will figure how to deal with the copy properly behind the scenes.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]