That could be some vault services, 
https://airflow.apache.org/docs/apache-airflow-providers-hashicorp/stable/secrets-backends/hashicorp-vault.html.

Probably thinking it as a key-value store, and looking back to that example 
from this perspective

DEV Environment
“key”: FILE_PATH
“value”: file://some_path

PROD Environment
“key”: FILE_PATH
“value”: gs://some_path

assuming that the it is mounted to the pod at initialization when triggering a 
task with k8sPodOperator, the application runs in the pod can then pull 
environment-specific values with the same key. it is just an example and there 
are multiple options or ways to implement it. The differences might come from 
the details on how things are set up and the deployment model.

Hope this help,
Kevin Yang

Sent from Outlook for iOS<https://aka.ms/o0ukef>
________________________________
From: Josef Šimánek <josef.sima...@gmail.com>
Sent: Thursday, August 7, 2025 9:20:25 AM
To: dev@airflow.apache.org <dev@airflow.apache.org>
Cc: Bolke de Bruin <bo...@apache.org>
Subject: Re: [DISCUSS] Best practices for initializing `ObjectStoragePath`

čt 7. 8. 2025 v 15:04 odesílatel Kevin Yang <kevinsjy...@gmail.com> napsal:
>
> Maybe join this conversation late, but I would like to share/document a 
> pattern and hope which can provide more references for the discussion.
>
> In terms of deployment model, there are isolated environments, each with 
> their own deployment of Airflow instance and the required services.
>
> DEV Environment > DEV Airflow + other DEV infra
> PROD Environment > PROD Airflow + other PROD infra
>
> The secrets or configuration usually are already configured to be accessible 
> from somewhere (e.g. environment variables, vault service) with respect to 
> each environment. The code is basically the same but it fetches configuration 
> data or secrets from the corresponding environment. For example,
>
> DEV Environment > “S3 bucket” > s3://dev-bucket
> PROD Environment > “S3 bucket” > s3://prod-bucket

And how do you store connections? What if DEV will be S3 and PROD will be GCS?

> Under this deployment model, environment specific configuration may not be 
> configured at operator.
>
> Hope this help,
> Kevin Yang
>
> Sent from Outlook for iOS<https://aka.ms/o0ukef>
> ________________________________
> From: Josef Šimánek <josef.sima...@gmail.com>
> Sent: Monday, August 4, 2025 6:57:08 AM
> To: dev@airflow.apache.org <dev@airflow.apache.org>
> Cc: Bolke de Bruin <bo...@apache.org>
> Subject: Re: [DISCUSS] Best practices for initializing `ObjectStoragePath`
>
> po 4. 8. 2025 v 12:42 odesílatel Bolke de Bruin <bo...@apache.org> napsal:
> >
> >  Josef is proposing to make ObjectStoragePath construction
> > environment-agnostic by storing provider and base path
> > in Airflow connections. So you just need to change the connection
> > configuration.
>
> I'm trying to make it actually provider agnostic, not environment
> agnostic. IMHO it is not possible to simply construct
> ObjectStoragePath just from connection without specifying the protocol
> (gcs, s3, file...). I need to store at least 2 parts in ENV:
>
> - connection itself (like gcs with key paths)
> - protocol + bucket/base path (like "gcs://my-test" in Airflow
> variable for example)
>
> Buckets are usually different per environment (since GCP bucket names
> are unique across the whole platform). In some environments it could
> be handy to also use different connections/protocol like sometimes it
> is more friendly to use file locally (for development/debugging
> purpose) and s3 on a deployed environment.
>
> I would prefer to make the whole setup for ObjectStorgePath
> construction through one configuration (possible to be passed in an
> environment variable).
>
> > This makes indeed a tighter coupling and makes me wonder about the
> > deployment model. While creating ObjectStoragePath I
> > had standard CI/CD practices in mind, where these things typically get set
> > through environment variables. This does
> > assume a deployment-wide setting obviously and is not runtime selectable.
> > So to understand the case better I like to
> > know more about the need for runtime selection.
>
> I do setup connections and variables through environment variables and
> secrets in K8s deployment to 2 environments (test, production).
> Locally I'm using docker-compose.yml with .env file for development.
>
> > Care to clarify?
> >
> > Cheers
> > Bolke
> >
> >
> >
> >
> > On Sun, 3 Aug 2025 at 22:28, Jarek Potiuk <ja...@potiuk.com> wrote:
> >
> > > Bolke (or others) - maybe you can add something here and (re) ignite the
> > > discussion ?
> > >
> > > On Tue, Jul 22, 2025 at 8:40 PM Josef Šimánek <josef.sima...@gmail.com>
> > > wrote:
> > >
> > > > Hi everyone,
> > > >
> > > > I've been experimenting with `ObjectStoragePath` and recently opened a
> > > > [PR](https://github.com/apache/airflow/pull/52002) aiming to simplify
> > > > its construction using Airflow connections — especially in cases where
> > > > environments (e.g., dev, staging, prod) differ primarily in object
> > > > storage provider (e.g., S3, GCS, file) and base path.
> > > >
> > > > The goal was to construct a reusable root path from a connection like
> > > this:
> > > >
> > > > ```python
> > > > storage = 
> > > > ObjectStoragePath.from_conn(BaseHook.get_connection("storage"))
> > > > object = storage / "my_file.txt"
> > > > ```
> > > >
> > > > ...without needing to hardcode schemes like `s3://` or `gs://` and
> > > > base paths (usually "buckets") into the DAG code. The idea was to
> > > > infer provider and base path from connection `extra` fields (e.g.,
> > > > `provider`, `base_path`), allowing the same DAG code to work across
> > > > environments by simply reconfiguring the connection.
> > > >
> > > > The PR sparked a great discussion (linked above), and I realized this
> > > > might be a good opportunity to collect **broader community
> > > > experience** around the use of `ObjectStoragePath` and object storage
> > > > in general.
> > > >
> > > > A few questions I'd like to raise:
> > > >
> > > > * How are you configuring access to object storage across environments?
> > > > * Do you find it useful to extract `scheme` and `base_path` from
> > > > connections (or any other configuration)?
> > > > * Are there existing best practices or patterns for making
> > > > `ObjectStoragePath` construction generic and environment-agnostic?
> > > > * Would it make sense to define a common utility or convention (e.g.
> > > > via extras, `get_fs`, provider's `filesystems`, or a connection
> > > > helper)?
> > > >
> > > > I’m primarily looking for the best pattern—if any exists—or hoping we
> > > > can come together to define and document one as a community.
> > > >
> > > > Best regards,
> > > > Josef Šimánek (https://github.com/simi)
> > > >
> > > > ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
> > > > For additional commands, e-mail: dev-h...@airflow.apache.org
> > > >
> > > >
> > >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
> For additional commands, e-mail: dev-h...@airflow.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
For additional commands, e-mail: dev-h...@airflow.apache.org

Reply via email to