Hi everyone,
I've been experimenting with `ObjectStoragePath` and recently opened a
[PR](https://github.com/apache/airflow/pull/52002) aiming to simplify
its construction using Airflow connections — especially in cases where
environments (e.g., dev, staging, prod) differ primarily in object
storage provider (e.g., S3, GCS, file) and base path.
The goal was to construct a reusable root path from a connection like this:
```python
storage = ObjectStoragePath.from_conn(BaseHook.get_connection("storage"))
object = storage / "my_file.txt"
```
...without needing to hardcode schemes like `s3://` or `gs://` and
base paths (usually "buckets") into the DAG code. The idea was to
infer provider and base path from connection `extra` fields (e.g.,
`provider`, `base_path`), allowing the same DAG code to work across
environments by simply reconfiguring the connection.
The PR sparked a great discussion (linked above), and I realized this
might be a good opportunity to collect **broader community
experience** around the use of `ObjectStoragePath` and object storage
in general.
A few questions I'd like to raise:
* How are you configuring access to object storage across environments?
* Do you find it useful to extract `scheme` and `base_path` from
connections (or any other configuration)?
* Are there existing best practices or patterns for making
`ObjectStoragePath` construction generic and environment-agnostic?
* Would it make sense to define a common utility or convention (e.g.
via extras, `get_fs`, provider's `filesystems`, or a connection
helper)?
I’m primarily looking for the best pattern—if any exists—or hoping we
can come together to define and document one as a community.
Best regards,
Josef Šimánek (https://github.com/simi)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]