bolkedebruin commented on PR #34729:
URL: https://github.com/apache/airflow/pull/34729#issuecomment-1748640426

   > Is the `/mnt` path necessary? I wonder if we should instead invisibly 
mount the fs to somewhere random by default and combine the concept with, say, 
Dataset, to help abstract away the URL string. This might take some cognative 
load off the users like having a global `pathlib.Path` allows writing relative 
paths.
   > 
   
   The '/mnt/' path isn't required. It's arbitrary - you could do '/whatever', 
except that (for now) it cannot be nested within another mount point. I'm not 
sure if mounting somewhere random makes sense (what's the point), but what 
could be done is to have the functions check for absolute paths, like 
`s3://warehouse` or `file:///etc`, so that they work with a.o. afs.open() as 
well. That _would_ make sense.
   
   > Also one problem I have to get my head around this is the example in the 
top post isn’t very complete. For example, what is `src` in the `@task` example 
supposed to be? Why do we need to explicitly mount, instead of done 
automatically by `afs.open`?
   
   `src` can be any path that is prefixed with a mount point. In the top post 
it could be `src = '/mnt/warehouse/data.gz'`. The idea behind having mount 
points is that it becomes much easier to test and to separate operations from 
development. Your code can remain the same throughout and you would just would 
need to adjust the source of the mount to make it work in a different setting. 
   
   ```python
   
   afs.mount("s3://warehouse", "/warehouse")
   
   # can become
   
   afs.mount("file:://tmp/warehouse", "/warehouse", remount=True)
   
   # code below remains the same
   ```
   
   As mentioned above it does make sense to have `afs.open` accept absolute 
paths as well so you can work with a different pattern.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to