Hi everyone,

I'd like to start discussion about a new AIP that we've been thinking about at Astronomer and that has been kicking around our heads since before I started preparing my Keynote for Airflow Summit 2021! At the time I called it a "New Concept: Data object".

<https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-48+Data+Dependency+Management+and+Data+Driven+Scheduling>

This AIP has gone through a number of rounds of growing and shrinking before we've finally ended up at what we think is the core foundation of the idea that fixes a real need of our users right away, and that gives us the foundation to add lots of cool features in the future.

In an attempt to distil the essence of the AIP for those of the tl;dr persuasion:

We want to make Airflow aware of the datasets that tasks and DAGs consume and produce.

We want to allow DAGs to be triggered based on datasets being updated, no longer just time based schedules.

We would like to add the foundation of automatic Data movement (reading and writing).

There is a lot more detail in the AIP, but this is the core of our idea.

We'd love your feedback, and none of the code shown is set in stone so I'm happy to hear idea on how to improve the DAG writer experience.

Ash and Vikram,
Astronomer.io

Reply via email to