bolkedebruin commented on PR #35612:
URL: https://github.com/apache/airflow/pull/35612#issuecomment-1809852382
> @bolkedebruin Are you planning to expose `storage_options` in this PR or
separate?
>
> > Work is underway to allow storage_options to be passed
> > directly to the underlying filesystem.
>
> I would have loved for you to have separate PRs for:
>
> 1. Moving from `airflow.io.store.path` -> `airflow.io.path`
>
> 2. Change from `os.Pathlike` -> `upath.CloudPath`
>
> 3. Expose `storage_options`
>
>
> That way it is easier to review, the diff currently is big :)
Well, after this feedback I cannot put the `storage_options` in this PR eh
;-). But you are right, it should have been smaller. I'll add here some
considerations and I hope you can review it as is. I mostly have what you want
in separate commits in this pR.
**Deriving from Path / upath.CloudPath**
pathlib.Path makes it very hard to inherit from and it even does that
intentionally by (poorly motivated?) design (This probably changes in python
3.13). So, when I started out with ObjectStoragePath I decided to make it
`os.Pathlike`. The downside is that it has a pretty weak API and doesn't
specify what methods should be available. So I, well intended, just added a few
that seemed to make sense by looking at the `Path` interface and adding some
for convenience.
Enter `upath`. upath does derive from `Path` and therefore has a stricter
API. It still has the downside of being difficult to derive from, hence the
`__new__` refactor, but it has implemented the semantics that we need. The
refactor of ObjectStoragePath therefore keeps the API the same as before minus
`find` and `du` and `walk`. `walk` might be re-added again as it is - i think -
useful for the dag processor in the future.
_Considerations_
* ObjectStoragePath is a more opinionated version of `upath.CloudPath` and
it is the only implementation of Path that supports the Airflow-isms (conn_id,
getting init from providers etc, serialization). While you can `attach` other
filesystems (e.g. `dbfs` or `webhdfs`) the semantics will be as an object
store. `upath` does implement different semantics if required, not sure of that
is useful for us.
* Naming: some discussion happened with @uranusjr on whether
ObjectStoragePath is the right name. CloudStoragePath could be al alternave,
but it rubs me the wrong way somehow.
**airflow.io.store.path --> airflow.io.path**
@jscheffl was already in favor of moving it up the hierarchy, I think it has
a more natural fit here and unties it from `store`
which is still there, but not as important.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]