uranusjr commented on PR #34729:
URL: https://github.com/apache/airflow/pull/34729#issuecomment-1748680743
What I’m envisioning is something like
```python
warehouse_mnt = afs.mount("s3://warehouse") # Can have conn_id too, it’s
orthogontal.
output_mnt = afs.mount("file:///tmp")
@dag
def my_dag:
@task
def load_file(src):
with afs.open(src) as f:
f.read()
load_file(warehouse_mnt / "my_data.csv")
```
instead of exposing the mount to the user, we encapsulate the data inside
the Mount object and expose a Path-like interface to let the user operate on it
directly. You can work with the mount directly as well, either by passing a
mount point explicitly to `mount` or by accessing `mnt.mount_location` (or
whatever, returns the location as as string) and work with that.
The Dataset part I’m thinking now is pretty simple, just make the Mount
object inherit from Dataset (or _is_ Dataset?) so that object can be used for
both purposes without duplicating the URL if you need that. Not that useful but
the two are really the same idea (a reference to some resource) that I feel
shouldn’t be two things.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]