Thank you Bolke!
Interesting read.

I have a question about what is the pain we try to solve here. Most use
cases I encountered were about the need to sync dags from a branch in
GitHub (or equivalent) to the Airflow DAG folder.
Correct me if I am wrong but this AIP does not handle this. A sync
component will still be required to sync from git to S3/GCS/Other storage
and this AIP solves only the part that Airflow machines will be able to
fetch the files from storage. Is that correct?

On Sun, May 26, 2024 at 10:55 AM Bolke de Bruin <[email protected]> wrote:

> Hi All,
>
> I would like to discuss a new AIP aimed at enhancing the DAG loading
> mechanism to support reading DAGs from ephemeral storage solutions. This
> proposal is intended to supersede AIP-5 Remote DAG Fetcher and provide a
> more flexible and scalable approach and to prepare for AIP-63.
>
>
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-71+Generalizing+DAG+Loader+and+Processor+for+Ephemeral+Storage
>
> *Abstract*
> This proposal aims to generalize the DAG loader and processor to use
> pathlib.Path for file operations instead of assuming direct OS filesystem
> access. It includes implementing a custom module loader that supports
> loading from ObjectStoragePath locations and other Path-like abstractions,
> with caching capabilities provided by fsspec. Furthermore, while this AIP
> does not directly implement DAG versioning, it creates a foundational layer
> that can be extended to support DAG versioning as outlined in AIP-63.
>
> A work in progress PR can be found here:
> https://github.com/apache/airflow/pull/39647
>
> *Key points for discussion*
>
> Previous proposals, like AIP-5, suggested using a Fetcher mechanism. Kind
> of like an in-process git-sync. This proposal is about making that
> redundant by fully supporting object storage locations by leveraging
> ObjectStoragePath and fsspec caching mechanisms.
>
> Earlier feedback on AIP-5 was that we thought that having an additional
> Fetcher process was out of scope of the project. With the transient
> integration of pathlib.Path and ObjectStoragePath I think this argument
> does not hold anymore and the demand is there. In addition the added
> flexibility allows AIP-63 to be implemented easier (what that looks like
> remains to be seen).
>
> Airflow scans DAGs often. This very likely requires a caching mechanism for
> both the DAGs and their modules. Fsspec does implement caching, and it is
> planned to leverage this.
>
> Non DAG, Non module assets as part of the DAG folder are out of scope. So
> say for example for some reason you include a GIF. This will not
> automatically be available without changes to your code.
>
> I kindly request your thoughts :-).
>
> Bolke
>
> --
>
> --
> Bolke de Bruin
> [email protected]
>

Reply via email to