Looking forward to it. We have been talking about making Airflow data-aware since a long time :)
On Thu, 17 Feb 2022 at 17:07, Ash Berlin-Taylor <[email protected]> wrote: > Hi everyone, > > I'd like to start discussion about a new AIP that we've been thinking > about at Astronomer and that has been kicking around our heads since before > I started preparing my Keynote for Airflow Summit 2021! At the time I > called it a "New Concept: Data object". > > > https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-48+Data+Dependency+Management+and+Data+Driven+Scheduling > > This AIP has gone through a number of rounds of growing and shrinking > before we've finally ended up at what we think is the core foundation of > the idea that fixes a real need of our users right away, and that gives us > the foundation to add lots of cool features in the future. > > In an attempt to distil the essence of the AIP for those of the tl;dr > persuasion: > > We want to make Airflow aware of the datasets that tasks and DAGs consume > and produce. > > We want to allow DAGs to be triggered based on datasets being updated, no > longer just time based schedules. > > We would like to add the foundation of automatic Data movement (reading > and writing). > > There is a lot more detail in the AIP, but this is the core of our idea. > > We'd love your feedback, and none of the code shown is set in stone so I'm > happy to hear idea on how to improve the DAG writer experience. > > Ash and Vikram, > Astronomer.io >
