uranusjr commented on PR #34729:
URL: https://github.com/apache/airflow/pull/34729#issuecomment-1750230494
I thought about the overall design some more. Aggregated comments on
multiple things.
re: `__truediv__` returning str. In `pathlib`, `Path.__truediv__` returns
another Path instance so you can do e.g. `path / "foo" / "bar"`, which users
would likely expect. I thought this should return another Mount object, but you
pointed out the thing I got wrong, Mount does not represent a file endpoint,
but more like a volume, so `__truediv__` should return something of another
type similar to Path instead. At this point I feel we should leave that part
out of the initial implementation and focus on getting other things right (see
below).
re: `unmount`. I’m guessing, but perhaps this is similar to @ashb’s concern
on global state. Currently it is not at all clear when the user _can_ call
cleanup, due to how `@task` functions are actually executed in separate
workers, and how global variables in the DAG file work may not be obvious to
users. I feel Airflow should either require a more explicit approach such as
```python
m = Mount("s3://...") # Does not register anything but simply creates an
object in memory.
@task
def foo():
with m: # Actually registers the mount.
```
or be magically smarter about this and control the mount registration only
inside a particular task context instead. We can draw inspiration from other
platforms, such as how Pytest uses argument names to infer fixtures (which have
a similar context problem), or how Dagster has `@asset(deps=...)` to define the
task needs certain resources in the context.
Finally, I think this needs an AIP to document the various design decisions
and better describe the high-level view.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]