dstandish opened a new pull request, #24908: URL: https://github.com/apache/airflow/pull/24908
exploration adding dataset event [log] table which stores all events, even those which are ignored for purpose of dag run queue records (e.g. say the dataset has already been updated once and the dag run is still waiting on other datasets) opens up some possibilities for more elaborate triggering behavior. also, provides a way to scrutinize / surface / visualize all historical dataset events (the queue table's records are ephemeral) even before we have a chance to make enhancements beyond 2.4, i suspect it would be pretty easy for a user to write a long-running deferrable operator that would consume dataset events and do things with them, e.g. implement more complex dag run triggering rules based on all the full event history. one component i'm just starting to play with and think about is how to get information, to the consuming dag, about what actually happened in the dataset event -- e.g. was it appended to, or replaced, this kind of thing. you can see in the change to example_datasets.py i'm using post_execute to insert some metadata in a dataset event payload. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
