čt 7. 8. 2025 v 15:04 odesílatel Kevin Yang <kevinsjy...@gmail.com> napsal:
>
> Maybe join this conversation late, but I would like to share/document a 
> pattern and hope which can provide more references for the discussion.
>
> In terms of deployment model, there are isolated environments, each with 
> their own deployment of Airflow instance and the required services.
>
> DEV Environment > DEV Airflow + other DEV infra
> PROD Environment > PROD Airflow + other PROD infra
>
> The secrets or configuration usually are already configured to be accessible 
> from somewhere (e.g. environment variables, vault service) with respect to 
> each environment. The code is basically the same but it fetches configuration 
> data or secrets from the corresponding environment. For example,
>
> DEV Environment > “S3 bucket” > s3://dev-bucket
> PROD Environment > “S3 bucket” > s3://prod-bucket

And how do you store connections? What if DEV will be S3 and PROD will be GCS?

> Under this deployment model, environment specific configuration may not be 
> configured at operator.
>
> Hope this help,
> Kevin Yang
>
> Sent from Outlook for iOS<https://aka.ms/o0ukef>
> ________________________________
> From: Josef Šimánek <josef.sima...@gmail.com>
> Sent: Monday, August 4, 2025 6:57:08 AM
> To: dev@airflow.apache.org <dev@airflow.apache.org>
> Cc: Bolke de Bruin <bo...@apache.org>
> Subject: Re: [DISCUSS] Best practices for initializing `ObjectStoragePath`
>
> po 4. 8. 2025 v 12:42 odesílatel Bolke de Bruin <bo...@apache.org> napsal:
> >
> >  Josef is proposing to make ObjectStoragePath construction
> > environment-agnostic by storing provider and base path
> > in Airflow connections. So you just need to change the connection
> > configuration.
>
> I'm trying to make it actually provider agnostic, not environment
> agnostic. IMHO it is not possible to simply construct
> ObjectStoragePath just from connection without specifying the protocol
> (gcs, s3, file...). I need to store at least 2 parts in ENV:
>
> - connection itself (like gcs with key paths)
> - protocol + bucket/base path (like "gcs://my-test" in Airflow
> variable for example)
>
> Buckets are usually different per environment (since GCP bucket names
> are unique across the whole platform). In some environments it could
> be handy to also use different connections/protocol like sometimes it
> is more friendly to use file locally (for development/debugging
> purpose) and s3 on a deployed environment.
>
> I would prefer to make the whole setup for ObjectStorgePath
> construction through one configuration (possible to be passed in an
> environment variable).
>
> > This makes indeed a tighter coupling and makes me wonder about the
> > deployment model. While creating ObjectStoragePath I
> > had standard CI/CD practices in mind, where these things typically get set
> > through environment variables. This does
> > assume a deployment-wide setting obviously and is not runtime selectable.
> > So to understand the case better I like to
> > know more about the need for runtime selection.
>
> I do setup connections and variables through environment variables and
> secrets in K8s deployment to 2 environments (test, production).
> Locally I'm using docker-compose.yml with .env file for development.
>
> > Care to clarify?
> >
> > Cheers
> > Bolke
> >
> >
> >
> >
> > On Sun, 3 Aug 2025 at 22:28, Jarek Potiuk <ja...@potiuk.com> wrote:
> >
> > > Bolke (or others) - maybe you can add something here and (re) ignite the
> > > discussion ?
> > >
> > > On Tue, Jul 22, 2025 at 8:40 PM Josef Šimánek <josef.sima...@gmail.com>
> > > wrote:
> > >
> > > > Hi everyone,
> > > >
> > > > I've been experimenting with `ObjectStoragePath` and recently opened a
> > > > [PR](https://github.com/apache/airflow/pull/52002) aiming to simplify
> > > > its construction using Airflow connections — especially in cases where
> > > > environments (e.g., dev, staging, prod) differ primarily in object
> > > > storage provider (e.g., S3, GCS, file) and base path.
> > > >
> > > > The goal was to construct a reusable root path from a connection like
> > > this:
> > > >
> > > > ```python
> > > > storage = 
> > > > ObjectStoragePath.from_conn(BaseHook.get_connection("storage"))
> > > > object = storage / "my_file.txt"
> > > > ```
> > > >
> > > > ...without needing to hardcode schemes like `s3://` or `gs://` and
> > > > base paths (usually "buckets") into the DAG code. The idea was to
> > > > infer provider and base path from connection `extra` fields (e.g.,
> > > > `provider`, `base_path`), allowing the same DAG code to work across
> > > > environments by simply reconfiguring the connection.
> > > >
> > > > The PR sparked a great discussion (linked above), and I realized this
> > > > might be a good opportunity to collect **broader community
> > > > experience** around the use of `ObjectStoragePath` and object storage
> > > > in general.
> > > >
> > > > A few questions I'd like to raise:
> > > >
> > > > * How are you configuring access to object storage across environments?
> > > > * Do you find it useful to extract `scheme` and `base_path` from
> > > > connections (or any other configuration)?
> > > > * Are there existing best practices or patterns for making
> > > > `ObjectStoragePath` construction generic and environment-agnostic?
> > > > * Would it make sense to define a common utility or convention (e.g.
> > > > via extras, `get_fs`, provider's `filesystems`, or a connection
> > > > helper)?
> > > >
> > > > I’m primarily looking for the best pattern—if any exists—or hoping we
> > > > can come together to define and document one as a community.
> > > >
> > > > Best regards,
> > > > Josef Šimánek (https://github.com/simi)
> > > >
> > > > ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
> > > > For additional commands, e-mail: dev-h...@airflow.apache.org
> > > >
> > > >
> > >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
> For additional commands, e-mail: dev-h...@airflow.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
For additional commands, e-mail: dev-h...@airflow.apache.org

Reply via email to