čt 7. 8. 2025 v 15:04 odesílatel Kevin Yang <kevinsjy...@gmail.com> napsal: > > Maybe join this conversation late, but I would like to share/document a > pattern and hope which can provide more references for the discussion. > > In terms of deployment model, there are isolated environments, each with > their own deployment of Airflow instance and the required services. > > DEV Environment > DEV Airflow + other DEV infra > PROD Environment > PROD Airflow + other PROD infra > > The secrets or configuration usually are already configured to be accessible > from somewhere (e.g. environment variables, vault service) with respect to > each environment. The code is basically the same but it fetches configuration > data or secrets from the corresponding environment. For example, > > DEV Environment > “S3 bucket” > s3://dev-bucket > PROD Environment > “S3 bucket” > s3://prod-bucket
And how do you store connections? What if DEV will be S3 and PROD will be GCS? > Under this deployment model, environment specific configuration may not be > configured at operator. > > Hope this help, > Kevin Yang > > Sent from Outlook for iOS<https://aka.ms/o0ukef> > ________________________________ > From: Josef Šimánek <josef.sima...@gmail.com> > Sent: Monday, August 4, 2025 6:57:08 AM > To: dev@airflow.apache.org <dev@airflow.apache.org> > Cc: Bolke de Bruin <bo...@apache.org> > Subject: Re: [DISCUSS] Best practices for initializing `ObjectStoragePath` > > po 4. 8. 2025 v 12:42 odesílatel Bolke de Bruin <bo...@apache.org> napsal: > > > > Josef is proposing to make ObjectStoragePath construction > > environment-agnostic by storing provider and base path > > in Airflow connections. So you just need to change the connection > > configuration. > > I'm trying to make it actually provider agnostic, not environment > agnostic. IMHO it is not possible to simply construct > ObjectStoragePath just from connection without specifying the protocol > (gcs, s3, file...). I need to store at least 2 parts in ENV: > > - connection itself (like gcs with key paths) > - protocol + bucket/base path (like "gcs://my-test" in Airflow > variable for example) > > Buckets are usually different per environment (since GCP bucket names > are unique across the whole platform). In some environments it could > be handy to also use different connections/protocol like sometimes it > is more friendly to use file locally (for development/debugging > purpose) and s3 on a deployed environment. > > I would prefer to make the whole setup for ObjectStorgePath > construction through one configuration (possible to be passed in an > environment variable). > > > This makes indeed a tighter coupling and makes me wonder about the > > deployment model. While creating ObjectStoragePath I > > had standard CI/CD practices in mind, where these things typically get set > > through environment variables. This does > > assume a deployment-wide setting obviously and is not runtime selectable. > > So to understand the case better I like to > > know more about the need for runtime selection. > > I do setup connections and variables through environment variables and > secrets in K8s deployment to 2 environments (test, production). > Locally I'm using docker-compose.yml with .env file for development. > > > Care to clarify? > > > > Cheers > > Bolke > > > > > > > > > > On Sun, 3 Aug 2025 at 22:28, Jarek Potiuk <ja...@potiuk.com> wrote: > > > > > Bolke (or others) - maybe you can add something here and (re) ignite the > > > discussion ? > > > > > > On Tue, Jul 22, 2025 at 8:40 PM Josef Šimánek <josef.sima...@gmail.com> > > > wrote: > > > > > > > Hi everyone, > > > > > > > > I've been experimenting with `ObjectStoragePath` and recently opened a > > > > [PR](https://github.com/apache/airflow/pull/52002) aiming to simplify > > > > its construction using Airflow connections — especially in cases where > > > > environments (e.g., dev, staging, prod) differ primarily in object > > > > storage provider (e.g., S3, GCS, file) and base path. > > > > > > > > The goal was to construct a reusable root path from a connection like > > > this: > > > > > > > > ```python > > > > storage = > > > > ObjectStoragePath.from_conn(BaseHook.get_connection("storage")) > > > > object = storage / "my_file.txt" > > > > ``` > > > > > > > > ...without needing to hardcode schemes like `s3://` or `gs://` and > > > > base paths (usually "buckets") into the DAG code. The idea was to > > > > infer provider and base path from connection `extra` fields (e.g., > > > > `provider`, `base_path`), allowing the same DAG code to work across > > > > environments by simply reconfiguring the connection. > > > > > > > > The PR sparked a great discussion (linked above), and I realized this > > > > might be a good opportunity to collect **broader community > > > > experience** around the use of `ObjectStoragePath` and object storage > > > > in general. > > > > > > > > A few questions I'd like to raise: > > > > > > > > * How are you configuring access to object storage across environments? > > > > * Do you find it useful to extract `scheme` and `base_path` from > > > > connections (or any other configuration)? > > > > * Are there existing best practices or patterns for making > > > > `ObjectStoragePath` construction generic and environment-agnostic? > > > > * Would it make sense to define a common utility or convention (e.g. > > > > via extras, `get_fs`, provider's `filesystems`, or a connection > > > > helper)? > > > > > > > > I’m primarily looking for the best pattern—if any exists—or hoping we > > > > can come together to define and document one as a community. > > > > > > > > Best regards, > > > > Josef Šimánek (https://github.com/simi) > > > > > > > > --------------------------------------------------------------------- > > > > To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org > > > > For additional commands, e-mail: dev-h...@airflow.apache.org > > > > > > > > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org > For additional commands, e-mail: dev-h...@airflow.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org For additional commands, e-mail: dev-h...@airflow.apache.org